15 posts tagged “newcastle university”
Congratulations, Bug Busters! You didn't just get a gold star, you got a gold award!
Though I was not involved, many of my friends were part of the Newcastle University iGEM 2008 team, either as supervisors or students. You can read more on the Newcastle University iGEM entry wiki page. Of the 84 teams competing, only 16 won gold medals, including, from the UK, Edinburgh, Imperial and Newcastle.
From the overview of the team's wiki page:
"We aimed to develop a diagnostic biosensor for detecting pathogens. We wanted this to be cheaply and readily available for deployment in areas where access to medical resources, such as refrigeration and sophisticated laboratories, is limited or absent. We chose to use Bacillus subtilis as a method of delivery due to its ability to sporulate. The sensor bacteria could then be dried down as spores, which are very stable and extremely resilient to hostile environmental conditions, and rehydrated when required. The ambient temperature of much of the developing world is ideal for the growth of Bacillus spp. without the use of incubation equipment.
Gram-positive bacteria communicate using quorum communication peptides. Research has shown that these peptides are extremely strain-specific. We chose to engineer B. subtilis 168 to detect four Gram-positive pathogens by their quorum communication peptides. The different combinations of quorum communication peptides would be sensed by the engineered bacterium, and this signal converted into a visual output as fluorescent proteins such as mCherry, GFP, CFP and YFP." Read more.
Well done!
P.S. Looks like kudos to my old alma mater, Rice University, too! Congrats!
There are three bioinformatics jobs (one in pure bioinformatics, one in network analysis, and another in modelling/mathematical biology) currently available within CISBAN, an interdisciplinary centre studying the systems biology of ageing and nutrition. The full particulars are posted both on Nature Jobs and on the Newcastle University Job Vacancies web pages.
Below are links to the various job advertisements, as well as summaries of the jobs themselves. This is a summary of the three Nature Jobs postings, put together on a single page for easy perusal. The closing date for all of these positions is 11 January 2008. This is a great opportunity, though I may be speaking from a biased perspective as I work at CISBAN and find it an interesting and challenging workplace.
Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health
Research Positions
Level F £25,134 – £32,796 p.a.
Level G: £33,779 – £40,335 p.a.We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC, to participate in studies of the mechanisms responsible for ageing and how they are affected by nutrition. Ageing is recognised internationally as a ‘grand challenge’ and is a field prioritised for growth. This post offer opportunities to work in an intensely multidisciplinary, world-class centre and contribute to the development and application of systems science.
Research Associate (Bioinformation/Computing Scientist – Applications)
To develop and maintain the computing software and hardware infrastructure for systems biology, including a central web portal integrating applications for data capture, storage and visualisation and high performance computing systems and databases, including a large Linux cluster.
Job reference: A1091R
Posts are tenable until 30 September 2010.
Enquiries for the post may be directed to Dr Anil Wipat, School of Computing Science (email: anil.wipat@ncl.ac.uk) Further particulars for this post can be found on the University’s web page at http://www.ncl.ac.uk/vacancies/list.phtml?category=Research.
Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director, Institute for Ageing and Health, Henry Wellcome Laboratory for Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email: tom.kirkwood@ncl.ac.uk).
Committed to Equal OpportunitiesCentre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health
Research Positions
Level F £25,134 – £32,796 p.a.
Level G: £33,779 – £40,335 p.a.We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC, to participate in studies of the mechanisms responsible for ageing and how they are affected by nutrition. Ageing is recognised internationally as a ‘grand challenge’ and is a field prioritised for growth. This post offer opportunities to work in an intensely multidisciplinary, world-class centre and contribute to the development and application of systems science.
Research Associate (Bioinformatician – Network Analysis)
To research and develop novel methods of representing and integrating molecular and cellular data as networks and apply this methodology to identify novel proteins and elucidate novel pathways involved in the process of cellular ageing and senescence.
Job reference: A1090R
Posts are tenable until 30 September 2010.
Enquiries for the post may be directed to Dr Anil Wipat, School of Computing Science (email: anil.wipat@ncl.ac.uk) Further particulars for this post can be found on the University’s web page at http://www.ncl.ac.uk/vacancies/list.phtml?category=Research.
Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director, Institute for Ageing and Health, Henry Wellcome Laboratory for Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email: tom.kirkwood@ncl.ac.uk).
Committed to Equal Opportunities
Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health
Research Positions
Level F £25,134 – £32,796 p.a.
Level G: £33,779 – £40,335 p.a.We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC, to participate in studies of the mechanisms responsible for ageing and how they are affected by nutrition. Ageing is recognised internationally as a ‘grand challenge’ and is a field prioritised for growth. This post offer opportunities to work in an intensely multidisciplinary, world-class centre and contribute to the development and application of systems science.
Research Associate (Modeller/Mathematical Biologist)
To develop models of molecular and cellular mechanisms of ageing and to explore links between ageing, development and evolution from a life-course perspective. This post will also involve collaboration within the EU Network of Excellence LifeSpan, linking development and ageing.
Job Ref: A1092R
Posts are tenable until 30 September 2010.
Enquiries for the post may be directed to to Professor Tom Kirkwood, Institute for Ageing and Health (email: tom.kirkwood@ncl.ac.uk) Further particulars for this post can be found on the University’s web page.
Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director, Institute for Ageing and Health, Henry Wellcome Laboratory for Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email:* tom.kirkwood@ncl.ac.uk).
Committed to Equal Opportunities
The two-day FuGE Users' workshop was organized by Norman Paton and held at the University of Manchester. It was great fun, and if you just want the short summary of my time there, then just know that there was loads of enthusiasm for FuGE as well as interesting talks, both by communities who were already extending FuGE, and by developers who were already building tools and databases based on it. There were only a dozen or so people, which kept the discussions lively but neither too long nor too divergent. The workshop dinner was great, though the trip to the restaurant was correctly described by one of the attendees as an Odyssey. For more information on the social aspect of the FuGE workshop, please have a look at Phil Lord's humorous posting on the matter. For another post on the workshop, see the peanutbutter Bioinformatics blog by Frank Gibson.
If you wish to read the longer notes rather than the short summary, then please read on!
Please note that these are my own notes, and are in no way considered to be an "official" FuGE report on the workshop. As such, any errors or inconsistencies are entirely my own. However, if you see a problem with this post, then please let me know, and I'll fix it!
The objectives of the workshop were to share and document experiences in the use of FuGE, to identify good-practices, to document guidelines, and to make known these experiences and guidelines. Hopefully, the result will be a paper that documents the current users' experiences and increases communities' understanding of FuGE. It will hopefully help people who who have read the Nature Biotechnology paper and want to use FuGE, but aren't completely sure what to do next.
Attendees were:
Peter Wilkinson, from Montreal, who was interested in FuGE for flow cytometry.
Khalid Belhajjame: works with Norman Paton in Manchester, and who may soon be a full-time developer of FuGE
Javier Santoyo: University of Edinburgh, part of a consortium trying to develop standards for RNAi work
Andy Jones: one of the original developers of FuGE, from Liverpool, developed GelML with Frank Gibson.
Heiko Rosenfelder: German Cancer Centre at Heidelberg, here as part of MIACA, and wants to use FuGE for the cellular assay format.
Martin Eisenacher: Proteome Centre (mzML and analysisXML) and wants to use FuGE
Phil Lord, Frank Gibson: via CARMEN, wants to use FuGE. Frank also developed GelML with Andy Jones.
Neil Wipat, Matt Pocock, Allyson Lister: We use FuGE in our internal application for storing HT data. Matt and Allyson also involved in OBI.
Leandro Hermida: SIB, they're part of a group that is making SystemsX. Want to use FuGE to store and manage the data. Also want to make an extension of FuGE for deep sequencing technologies.
Norman Paton: originally from proteomics field, but developer of FuGE and organizer of the workshop.
Session 1: Experiences Using and Extending FuGE
GelML – Frank Gibson and Andy Jones
GelML is a FuGE extension that has passed the PSI approval process. PSI defines community standards for data representation in proteomics. There are a variety of working groups, including gel electrophoresis, mass spectrometry, protein modifications, etc. Within the Gel WG there are three specifications: MIAPE-GE (minimum checklist for reporting gel elecrotphoresis experiments), sepCV (controlled vocabulary), and GelML (data transfer format, based on FuGE).
GelML covers the model of a gel, 1-D and 2-D GE, other (N-dimensional) GE's, sample loading, electrophoresis, detection, image acquisition, the excision of locations on gels, and SubstanceMixtureProtocol and SubstanceAction.
The first extended FuGE class described was the Material abstract class. The first of such classes is the Gel class. A Gel has Dimensions, MeasuredMaterial, and others. You use the “Measurement” package to describe the characteristics of the Gel. Measurements include PercentageAcrylamide, while information about the gel (i.e. if purchased, from where), information on the Dimensional Units and CrossLinkerRatio are all FuGE OntologyTerms). MeasuredMaterial was not originally in FuGE because it was planned that such substances could be captured by ontology terms. Rather than using named associations to GenericParameter, they tended to use either GenericParameter (with a CV term) or extended the Parameter class. This was just a design decision, and he would like to see how others do it.
Another extended FuGE class is the Protocol abstract class. The GelML SampleLoadingProtocol has an AddBufferAction which points to a SubstanceMixtureProtocol. 2DGelProtocol has a SampleLoadingAction, a FirstDimensionAction, a SecondDimensionAction (both Electrophoresis protocols), and an InterDimensionAction (for when something happens between the first and second dimension actions), and the DetectionAction.
Within the Electrophoresis protocols there is the ElectrophoresisStop (an Action) which contains a StopTime, which is a TimeParameter, with has Duration and TimePoint. They'd be really interested to see how others have/would like to model time.
It was also a design decision to guide people with the structure of the XML to help them know what to fill out, e.g. you must have a 2dGelProtocol.
For each case, should we extend the FuGE model or add experiment-specific semantics through the use of ontologies? I think this is a case of using both, depending on the circumstances.
They have used standard XML references within the documents. But, for instance, do we still need internal document identifiers when the ontologyURI is a globally-unique identifer anyway? Maybe required if the terms are created ad hoc within the group making the XML file. What is the best way to use ontologies?
AnalysisXML – Martin Eisenacher
http://www.fp6-prodac.eu
He is a member of the ProDaC Consortium. ProDaC is a funded consortium that is meant as a “coordination action” within the 6th EU Framework Programme. Its aims are the development of international standards, standardized data submission pipelines, systematic data collection in public standards-compliant repositories, and data access for community and publication. There was a kick-off meeting of ProDaC in Long Beach in October 2006, and there have been two workshops since.
Proteomics data includes spectra (peak lists), and peptide lists. He works specifically with the MS (for peak lists and instruments, mzML) and Proteomics Informatics (for “results”, analysisXML) PSI WG's. mzML is a merger of mzData and mzXML. Perhaps this merger is one of the reasons that it is not currently FuGE-based. AnalysisXML includes annotation of search databases, search, algorithms, search parameters, instrument characteristics, peptides (peptide-spectrum link, peptide scores), proteins (protein-peptide link, protein scores, significance values, false-discovery estimation) and quantisation. In September 2007 they added comments into the UML that are passed into the XML.
They use the MagicDraw Community Edition, which is available for free. The Analysis package is subdivided into process, quantisation, and search. Process contains things that aren't directly related to the search protocol applications, but other steps such as ProteinDetermination and PolypeptideProcess. Some of the classes they have made that inherit from the Data class inside the search package include AnalysisResultSet (a set of spectra), AnalysisResult (a spectrum), and AnalysisResultItem (all peptides found for that spectrum). These are all abstract classes, whose concrete subclasses include PolypeptideResultSet, PolypeptideSearchResult, and PolypeptideResultItem.
At the moment they are assembling their own CV (to include search parameters that are most commonly used in search engines like MASCOT), but they can also use Pride CV. They use the ontology classes directly from FuGE, without extending it. This means that it fits what they need without modifications.
In analysisXML, peptides and sequences are listed only once. Different types of analyses in one file or in separate files with external cross-references. Also, the AnalysisProtocol could be used as parameter input for search engines. However, there are many cross-references and unique identifiers that are not validated by the FuGE Schema. Further, there are external cross-references to mzML, which can be difficult if you have only local files and not public URI's. Also, sequences (just the letters) are not polypeptides (“real” molecules with modifications). Therefore, the ConceptualMolecule FuGE class is not appropriate for polypeptides, though it is suitable for sequences (though they are still able to use that class). Additionally, the ResultSet-ResultItem hierarchy does not fit all analysis types. Finally, many FuGE elements seem to have very long names that aren't always useful (but you shouldn't be typing XML manually!).
All items of the collections have unique identifiers. References to them are attributes called “..._ref”. Schema validation does not consider whether _ref links to an allowed section (or that used CV's are allowed). In mzML, for example, “semantic validation” of CV's is possible (suggested/implemented by the EBI). Are identifier checks possible? ProDaC has an online validator for mzML, analysisXML, mzData and prideXML that performs semantic validation, though the extra ontology/CV checks are only supported for mzML.
Still to do is the finalization of analysisXML, which is a deliverable for last October! They also want to provide “Quality Determination” as a process. They also want to make some use-cases and instance documents. They will have some from Matrix Science, MPC. Also, they need to finalize the CV they are using.
SyMBA – Allyson Lister
I gave this talk, so I didn't write anything about it! Instead, have a look at the SourceForge website (http://symba.sf.net):
The Integrative Bioinformatics Group, headed by Neil Wipat and part of The Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN), has developed a data archive and integrator (SyMBA) based on Milestone 3 of the Functional Genomics Experiment (FuGE) Object Model (FuGE-OM), and which archives, stores, and retrieves raw high-throughput data. Until now, few published systems have successfully integrated multiple omics data types and information about experiments in a single database. SyMBA is the first published implementation of FuGE that includes a database back-end, expert and standard interfaces, and a Life Science Identifier (LSID) Resolution and Assigning service to identify objects and provide programmatic access to the database. Having a central data repository prevents deletion, loss, or accidental modification of primary data, while giving convenient access to the data for publication and analysis. It also provides a central location for storage of metadata for the high-throughput data sets, and will facilitate subsequent data integration strategies.
Developing Flow Cytometry FuGE extensions – Peter Wilkinson
Developing MIFlowCyt. Originally, they stored the metadata and data in a single file, but their latest format (ACS) will separate these two types. They are considering having some of their data formats be in RDF as well as XML, even for those formats that will be built on FuGE – is there a good XML to RDF converter? I suppose so, as I've been able to save OWL/RDF as OWL/XML in Protege 4.
One example of their extension is Cytometer, which is a subclass of equipment. How descriptive should they get with their samples? Should it be at the entity or attribute level? For instance, there is a conceptual difference between prepared samples and “generic” materials. But why not draw an association to material and call it “sample”? They can't do that because sample has a lot of associations itself that aren't present in Material. For things like buffers and solutions, spML doesn't seem to view them as things that exist – you just talk about them in the protocol. This way you don't have to list them 1000s of times. In FC, you have to know exactly which thing is used in the protocol (e.g. they must record batch numbers). However, you could have a single buffer instance, and then in the ProtocolApplication you have a specific parameter that is modified in that particular application of the Protocol, such as the batch number.
Open issues include: FuGE should reference a stable version of AndroMDA, there should be a best-practice for deciding when a Generic* class is replaced by a specific omics-type class, how is the OntologyTerm abstract class intended to be used for specific controlled lists, fitting the organism into FuGE::Bio somewhere, and versioning.
He's also trying to write a FuGE database by hand, rather than using what is generated by AndroMDA, as he needs to squeeze as much performance out of the system as he can. Much more difficult, but could conceivably be much more efficient.
Generic and Custom Extensions – Andy Jones
spML is for sample processing. SubstanceMixtureProtocol is for describing a mixture of substances, e.g. buffers and solutions and the method of their creation. Actions relate to constituents. Timings relate to constituents and volume, concentration, or mass. SetPropertyAction is a generic model to be used in conjunction with protocols where parameters may be set with associated ActionText. Their chromatography extension comprises extensions of Protocol, Equipment, and ProtocolApplication. The ChromatographyProtocol contains extensions of Parameter, has a child protocol for sample injection, and various uses of GenericActions. With ChromatographyEquipment, there is column-associated sub-components. All extensions of Chromatography equipment can have additional parameters, including specific named parameters where they are always required. Uses Equipment:make. The mobile phase of LCProtocol is described using the SubstanceMixtureProtocol. Inputs are defined with GenericMaterialMeasurement, and the outputs are either Chromatogram (ExternalData) and SeparationFraction. You can also have two-dimensional chromatography.
GenericSeparation is a protocol that uses generic models for defining substance used to create a separation gradient and the parameters applied. In this case, the equipment defines the type of separation and criteria using ontology terms – but how do you communicate how this should be used to all of the developers? In contrast, we don't want to have huge models. Inputs also defined using GenericMaterialMeasurement, and the outputs are either SeparationLog (ExternalData) and SeparationFraction.
TreatmentProtocol is a simple model for treatments, intended for labelling, mixing, splitting, and washing, for example. The treatment IO in TreatmentProtocolApplication is restricted to having material inputs and material outputs only.
There seems to be three sorts of models: column-oriented, category-oriented, and completely generic protocols. Much of what is in spML might be useful for the “library of models” we've been discussing.
The generic model is very flexible for different types of separation, and could be used for LC, GC, capillary electrophoresis, rotofors etc. It is also unlikely to break if new type of experiment is defined, and the Treatments model could potentially be useful in the context of any experiment type. Also, the generic model is much smaller, and can be used in various ways. However, this last one could be a “con” as well, because different users/implementers are likely to encode the same information in different ways. Further, a specific model can guide the user to provide specific details, e.g. for MIAPE compliance.
spML units are derived from the OBO Unit Ontology. Should FuGE extensions be allowed to have user-defined terms? It would be useful for the creation of in-house lists to populate drop-down menus.
Issues
spML
units Below are a list of questions and suggestions that we came up with while the initial talks progressed in the first couple of sessions. Many were discussed, and some were answered, in breakout sessions later. Notes from the discussions I was a part of are included below. The unanswered points in the list may have been discussed at other breakout sessions, or may still be untouched.
Discussion on Semantic Validation and Identifiers: Khalid Belhajjame, Norman Paton, Allyson Lister, Martin Eisenacher
Identifiers and Auxillary/Semantic Validation: Types of Validation and How Simple Support can be done.
|
|
Unique in Document |
Not Dangling |
Globally Unique |
Type Correct |
Notes |
||||
|
|
Property |
Checked by XML Tooling |
Property |
Checked by XML Tooling |
Property |
Checked by XML Tooling |
Property |
Checked by XML Tooling |
|
|
Instances of Identifiable |
yes |
GP |
yes |
GP (yes?) |
(+) |
no |
yes |
no |
See (1) |
|
Ontology Terms |
(#) |
n/a |
yes |
no |
yes |
no |
yes (not in UML) |
no (^) |
|
|
External Data |
($) |
n/a |
yes |
no |
yes |
no |
yes |
(*) |
(*)want to know it's a file of a particular kind |
GP: Can be checked with a generic program.
All things marked GP or X could be attacked by people wanting to write a semantic validation tool.
(+) Only for some types of Globally-unique identifiers would we be able to check that they were truly unique and well-formed.
(#): Should OntologyTerm elements be unique (irrespective of their identifiers, which must be unique)? If people compare OT identifiers they may think two terms are different when in fact they are the same, and someone was sloppy when making OT elements. However, if they have linked their OT to an OntologySource then it can be checked if it is both unique in document and globally unique (if it is a logical/physical uri)? In that case, why should OS be optional at all, if custom CV's can be included in the OS.
(^): This is where the ontology mapping files come in.
($): The same argument for uniqueness of ED applies as that in OT.
(1) Will we suggest a type of identifier to use with FuGE as a best-practice?
Do we still need internal document identifiers when the ontologyURI is a globally-unique identifier anyway?
Should identifiers be human readable?
Do community extensions automatically have their own namespace/prefix? That is, if “sample” is used in the FC community and also in another extension, will it be problematic if you try to create a multi-omics FuGE-ML file? This is all about linkage between different FuGE-based instances (unique identifiers, both within a single document and between documents.What is the identifier an identifier of? Is every Identifiable object a “first-class citizen”? We shouldn't force all (any!) identifiers to be URI's.
Should you use a logical or physical naming scheme?
Physical naming schemes:
Are fragile
May not work for all users (i.e. if the URI points to a laptop that isn't publicly accessible)
Logical naming schemes:
Are robust, but require a greater investment of time, as they need tools that provide resolving facilities.
If locally-unique identifiers are used:
it means that you may get into trouble in the long run
If globally-unique identifiers are used:
clashes between different FuGE-ML files will be avoided
People should look this over and discover which is the best setup for their situation.
If we use URI's, should URI's be resolvable? What is the scope?
Martin has a URI that points to a data file, and a (possibly locally unique) identifier that points to a spectrum within the data file. How to deal with this? Do we have a best-practice for it?
Schema Validation
Schema validation does not consider whether _ref links to an allowed section (or that used CV's are allowed). Native XML validation does not do this, but you could make a tool. In theory, the prefix before the _ref is always the name of the class. FuGE needs semantic validation.
How should user-defined ontology terms be validated in the XML?
Discussion on Versioning: Khalid Belhajjame, Norman Paton, Allyson Lister, Phil Lord, Matt Pocock, Leandro Hermida
Is there a best way to implement versioning?
Characteristics of (SyMBA) Versioning:
Complete History of Atomic Changes
Low Cost of Updates – No Cascades
Higher Cost of Retrieval
This is actually a transaction-time database with tuple-level timestamps. In a transaction-time database, the time is in the world of the database and not the time in the real world (vs valid-time database, where the time you insert does not match the time that you actually wanted to input). If you don't put the timestamp in the tuple, you put it in the attribute. In this context, people have looked at the properties of update operations.
Can't just use LSID versioning because there is no specification of how the version should be updated.
SyMBA Versioning Requirements:
Preserving the semantics of the LSID
Getting exactly the version requested, and getting all versions
Nothing should disappear
This isn't necessarily versioning, or what versioning in FuGE should be.
Leandro's Versioning
Requirements:
- Getting exactly the
version requested, and getting all versions
- Nothing should
disappear
Should this be done in FuGE, or in the FuGE-OM specifically? Perhaps just in the Maven build? We could put hooks in FuGE that would allow fine-grained logging. The current Audit setup does not allow linking back to previous versions unless you put the delta in free text somehow. The Audit classes may be suitable for XML, and you could make a log of such changes and roll-back (in a non-RDBMS way) to whatever version you want.
While it is clear we could make an STK that could have versioning of some type, whether or not this should be a (optionally-used) change to the OM is a much bigger thing. It is certainly a worry that versioning has to be dealt with at the application level. However, versioning at the file or XML level means multiple files, otherwise you'd have to apply a diff to a very large file.
We haven't really had the time to scan the space of options here. We could circulate a general document, and then outline what's actually been done so far. A paper would, in any case, be centered around pros/cons, and a bit less on current implementations, but definitely not say which is the “right” way to do things, as there is no single right way.
There are different technical solutions, and not all of these solutions should necessarily be provided in the model.
Discussion on Tools – Leandro Hermida, Heiko Rosenfelder, Neil Wipat, Phil Lord, Allyson Lister
What about trying to get some automatic mapping between the XML classes and the Hibernate/Spring classes?
There is a disconnect between the XSD that is generated from the XML schema cartridge and the code generated from your persistence cartridge.
This means you have to write your mapping manually.
There is a possibility that we could get hyperjaxb3 to work for this (Allyson had tried with an earlier version but it didn't work properly). Hyperjaxb3 generates both Entity POJO's and the jaxb2 classes. So, in theory you could only use the Andromda XSD cartridge and hyperjaxb3 for the rest. However, then you loose all the information that is present in the UML but not in the XSD.
Hyperjaxb3 uses both hibernate and ejb3 natively (you can choose). Leandro wants to work on a merged persistence/hyperjaxb3 extended cartridge, or perhaps its own cartridge. So perhaps the generation of a hyperjaxb3 cartridge.
Is there an XSLT that could be made to have a “standard” way of viewing a fuge experiment?
Khalid mentioned that it is important to allow input from the programmer in such a tool, so they can see as little or as much of the FuGE structure as you wish to present to the user.
Leandro is working on an ejb3 cartridge from the androMDA plugins project (not part of the AndroMDA core yet), and have used FuGE as a test-case. What this cartridge does is generate a mapping file and load it into any application server running hibernate and it will generate your database. Whereas the Hibernate+Spring cartridge generates 1) Entity POJO's + mapping files 2) Spring DAO + DAOException + DAOImpl. With the ejb3 cartridge you get 1) ejb3-annotated Entity POJO's + DAO*.
You can use Spring, if you wish, to build your web framework. Leandro decided to instead use Seam, which is the business layer of a web framework that builds on top of ejb3. Seam then uses the JSF (Facelets) and Jboss RichFaces for the actual web UI.
To get the Seam classes, you model <> classes and then draw dependencies, which then auto-generate Seam-enabled ejb service beans. However, the Facelets and RichFaces have their xhtml files manually, though AndroMDA creates the entire web/ structure and base Seam classes for you.
This doesn't answer our simple UI question.
The ejb3 cartridge has a web service (jax-ws, via soap) to your DAO's and Entity POJO's.With MAGE, someone wrote a regular Java Swing program where you download the jar which opens a little tabbed client that views MAGE. We could do something similar. (A J2SE app to write/read FuGE-ML of nice wizard interface)
The GSC has a lightweight XSD-to-web-form software app.
An XSLT, which is a style-sheet that is richer than CSS, but it is a tough language to use (convert XML to “HTML”). XSLT's don't have first-class functions, so you can't do anything generic.
Also would like to have simple jar that has input XML, output HTML.
This means three tool types here: 1) heavyweight (already existing in SyMBA and SystemsX) 2) midweight (J2SE app to read/write with a wizard-like interface) 3) lightweight (input XML, output HTML with some simple options).
Tool support for FuGE STK version 1, including a validator.
The MAGE STK includes a validator.
XML validation can be done with JAXB2 as is with the Milestone 3 STK, but longer-term we need the semantic validation tool.
Perhaps have some ontology lookup helper classes (OLS from the EBI?) to help users and developers add terms from (a certain set of?) ontologies. This may help people populate their databases, choose a term from a list on a front-end tool, etc.
Tool support for database schema / AndroMDA / Alternatives.
Dealt with in the other sections
Discussion on Challenging Constructs, including
Investigation Package, Abstract Associations, and the Ontology
Package - the entire group
What is the real meaning of the Investigation package? It's one of the few parts of FuGE that isn't meant to be extended.
How is the OntologyTerm abstract class intended to be used for specific controlled lists? One example is taxonomies as opposed to ontologies.
The intention is that this package would not be changed or extended by communities. Each technology would be reported in the InvestigationComponent. The Factor class actually is meant as a summary report of the factors used in the experiment. There is currently no direct link between the Factors and the protocol workflows – the detail can be recorded in other places in FuGE. It's a summary and duplication of the factor information.
So, if you want to say that your Investigation compared two different values for a single factor, the Investigation has the factor type, but not the data for the factor or the values themselves. However, you can connect to the data made from the various omics technologies via the DataPartition class. There could be a problem where it is a set of factors that only together make a particular set of data useful. Example: if your important aggregate of factors is time1.mouse1.foodstuff1. However, you would have to have each of these factors would be named separately, and you would get a different slice of data for the time1 than you would for the mouse1. How to you join them up? Perhaps allow multiple FactorValues (and OntologyTerms) for a single Factor. Not a very nice solution, though. Perhaps you don't need to change it at all, as you would only add Factors that are relevant to a particular InvestigationComponent. How do you describe which combinations of Factors are the combinations you're interested in? Norman did this by seeing an IC as a particular run of an experiment.
Dimensions are used in FuGE as a way of naming coordinates in a matrix. This does not mean that the data has to be stored here. You can store the data internally via the InternalData class, or you can reference it externally via the ExternalData class (or, of course, create your own subclass of Data).
There are 21 <>'s in FuGE, and all but 6 have identical concrete associations. Some auto-generated AndroMDA code mistakenly ignores the “abstract” parts and incorrectly generates the methods etc. In this case, you can just delete the abstract association in your copy of the UML and re-generate the code. It should be fixed within AndroMDA, though.
For multi-dimensional data, DataPartition are meant as a mechanism to relate back to the data from the Investigation, but many groups will choose not to use DataPartition. Very big, regularly-shaped data sets will be good things to use DataPartition with (e.g. Flow Cytometry). In the case of proteomics data, this may be more of a challenge. A best-practices documents should contain information on which data types are best-suited to this system, and which aren't. It should also include any alternatives to this system. One alternative solution to using DataPartition and its associated coordination system for dimensional data is to build an association from their data of interest back to FactorValue.
What is PartitionPair? In the case where you have two data files, and you wish to associate a particular row of one data file (for example) to a particular spectrum in another (to continue the example). So, it is a “shortcut” to linking particular data sets.
How should users of FuGE build CV lists using the Ontology Package? An OntologyTerm has an OntologyProperty, which contains both a DataProperty and ObjectProperty (these are the relationships within an ontology). Also inside OntologyTerm is OntologyIndividual. OI is the individual itself. Why not just provide the term – why try to recreate the structure of an ontology into UML? However, in OWL, every single class, relationship etc has a URI, so why not use those in UML? An example use: you have in an ontology the concept of age, which has an initial time point and a unit. How do you pull that concept into the UML? We're essentially creating a cut-down version of the ontology to allow extensibility in FuGE. But why would you want this? To create an individual of an ontology within the UML. It also allows restrictions of the name-values (left and right-hand side of a relationships) to those that are allowed within the ontology. One opinion is that there shouldn't be a purpose-built extensibility point in UML, as the entire purpose of UML is that it is extensible everywhere. It also means that users of your FuGE file don't need to parse both that file and the ontology file. However, the users of your file must understand your extensibility point that you've made, which isn't useful. The extra knowledge should be stored in that ontology, in the same way as analysisXML links to mzML. One solution is to have a Property class with term “height” and a Value class with term “meters”, and a PV class with associations to both Property and Value that provides the link. In the end, this is optional. In the guidelines, these concerns should be expressed.
Other questions not fully addressed:
How do we find out when Generic* classes should be replaced by a specific omics-type class? Rather than using named associations to GenericParameter, GelML tended to use either GenericParameter (with a CV term) or extended the Parameter class. Is there a best way to use Parameter/GenericParameter? If it is the same shape as the Generic class, and you are just renaming it, that is a good argument for using an ontology term. However, there is less of a learning curve for users if you subclass GenericParameter with your own name. Subclassing can lock you in, and may make life more difficult further down the line if your requirements change. Remember though, hardly anyone will write XML by hand, and we shouldn't worry too much about tool implementation. Still want to make it easier for tool developers, though!
How should we model time?
For experiment-specific semantics, when should we extend the FuGE model rather than add information through the use of ontologies?
How descriptive should extending communities get with their samples? Should it be at the entity or attribute level? Is there a best-practice that should be documented?
How do we find out if two classes from two different communities are actually the same? Recurring model requirements, e.g. a library of model fragments e.g time and sample.
Could organism be fitted into FuGE::Bio somewhere?
There is no date of the Action in the ActionApplication. You could have a time parameter that comes in when you add it to your own subclass of Action/ActionApplication, and then provide a different value for that parameter in ActionApplication.
Somewhere, the distinction between Action and Protocol should be defined.
In general, we should describe a modelling best-practice to tell what is considered “standard” procedure.
Data package: internal versus external data
There may be an issue with describing physical materials within Protocols versus ProtocolApplications (theoretical materials vs physical materials, SubstanceMixtureProtocol was designed to account for this problem)
Monday (19 November, 2007) saw the start of the two-day CISB07 conference, this year's internal conference for the 6 Centres for Integrated Systems Biology (for more information, see http://www.cisbs.org). There were also a number of invited keynote speakers. As I am unsure of the private/public status of many of the talks, I will just present some notes on those talks that I know are in the public domain.
CISB07 is being held at the Centre for Life in Newcastle. The Newcastle University CISB (CISBAN) is the host of the meeting, and I was one of about 8 people from CISBAN that helped organize it. It seems to have come together really well - it looked like everyone had a good time today. The Centre for Life is a science museum mainly for kids, but it has conferencing facilities too. I have to say that the people running the CfL today were very helpful and friendly. I'd definitely recommend the venue. After the afternoon session we had drinks on the mezzanine level of the museum, where many of the interactive (ok, and meant for children) games were. We had free reign on these games (reaction time games, soccer goal-scoring games, etc) and it was really fantastic to see all these scientists playing them with what can truly only be described as glee. Further, the CfL had their Motion Ride on for us - it's like a small cinema that plays a movie that's about 5 minutes long, and all of the chairs are on their own hydraulic pumps, and the chairs move in time with the movie. The theme tonight was "Dracula's Haunted House", and I have to admit I went on it - twice. Then it was in for dinner. However, it is the games that were really fun!
I gave a talk on SyMBA today as well, which I won't go into any more other than to say I'll put up the slides on the SourceForge site later this week, when I get a minute. As I can't be sure of the private/public status of any of the talks other than mine and the Data Integration Keynote (the keynote for the session I was in), I'll just pop up my notes from that keynote, from Dr. Chris Taylor of the EBI. It should be an interesting day tomorrow! Other than Chris' talk (notes below), I found that Michael Wilson's (from CPIB) and Mark Sansom's talks were also some of the more interesting talks of the day.
Standards development: A Two-Way Street (or, I Saw Six CISBs Come Sailing By)
(Chris Taylor, Keynote, Session 2)
There are three big omics standards bodies in biological science: MGED, PSI, MSI. HUPO's parent organization is PSI, which deals with molecular interactions, post-translational modifications, mass spec, separations, gels, etc. They also have a number of formal groups e.g. Steering Committee, working groups, and a document release process.
Many community formats are already built on FuGE-OM (GelML, mzML, analysisXML, etc). These standards provide increased efficiency (methods remain properly associated with the results generated, no need to repeatedly construct sets of contextualizing information, for industry specifically (in the light of 21 CFR Part 11)), enhanced confidence in data (enables fully-informed assessment of results, supports assessment of results that may have been generated months ago, facilitates better-informed comparisons of data sets, supports the discovery of sources of systematic or random error by correlating errors with metadata features such as the date of the operator concerned, follow-through with experiments performed), added value & tool development (re-using existing data sets for a purpose significantly different to that for which the data were generated, building aggregate data sets containing similar data from different sources, integrating data from different domains, design requirements become both explicit and stable).
Generic features of experiments; technologically and biologically-delineated sections as well. But you can't ever really carve these things up this way all the time, as technologies are cross-biological community, and vice versa. The difficulty also comes from trying to agree what a single term is from a mixed-community group of people. This is why they didn't use experiment in their structure.
Instead, they use an Investigation-Study-Assay (ISA) structure. The ISA structure starts out generic, and then you can extend it for your community. RSBI (of which CISBAN is a part) includes MIBBI, FuGE, and OBI. MI checklists are usually developed independently, which means they're usually partially redundant. That's where MIBBI comes in. Where they do overlap, they may cut things up differently or word things differently. Which MI should you use? There could be more than one that applies.
The CISBs are a prime source of researchers with a cross-domain view. MIBBI has already established contact with all six CISBs, and is waiting for tech and research summaries from them. Overall, what he wants is more interaction between systems biologists and “standardizers”.
It was a humorous and informative talk, and I really think he got the point about standards and common formats across to the group.
A couple of papers from here at Newcastle University have appeared over the past couple of weeks. Here's a summary of them both.
- Data Standards
From "An Update on Data Standards for Gel Electrophoresis" in Practical Proteomics Issue 1, September 2007, and by Andrew R. Jones and Frank Gibson.
From the abstract: "We report on standards development by the Gel Analysis Workgroup of the Proteomics Standards Initiative. The workgroup develops reporting requirements, data formats and controlled vocabularies for experimental gel electrophoresis, and informatics performed on gel images. We present a tutorial on how such resources can be used and how the community should get involved with the on-going projects. Finally, we present a roadmap for future developments in this area."
Provides a summary of ongoing work in the Gel electrophoresis and Gel informatics fields in terms of data and metadata standardization. This includes work on MIAPE GE and MIAPE GI, two checklists for minimal information required on these types of experiments and analyses. For both GE and GI, there are data formats (GelML and GelInfoML, respectively, both extensions of FuGE) and a suggested controlled vocabulary (sepCV). More information can be found on http://www.psidev.info.
Frank works in the CARMEN neuroscience project here at Newcastle, and Andy is in Liverpool and works on, among other things, FuGE. CARMEN collaborates with the SyMBA project, which was originally developed by me and a few others within Neil Wipat's Integrative Bioinformatics Group here at Newcastle but which is now a sourceforge project at http://symba.sf.net. Andy Jones is a co-author with me, Neil Wipat, Matt Pocock and Olly Shaw on an upcoming SyMBA paper. - Semantic Data Integration
A paper that was presented at the Integrative Bioinformatics Conference 2007 by me and my co-authors, Matt Pocock and Neil Wipat, is now available from the Journal of Integrative Bioinformatics website.
Allyson L. Lister, Matthew Pocock, Anil Wipat. Integration of constraints documented in SBML, SBO, and the SBML Manual facilitates validation of biological models. Journal of Integrative Bioinformatics, 4(3):80, 2007.
We have just opened up the Systems and Molecular Biology Data and Metadata Archive (SyMBA, formerly known as the CISBAN DPI) to the community under the terms of the GNU LGPL. Its new home is SourceForge, and there is a subversion repository, installation instructions, mailing list, issue tracker etc available.
The main URL is:
The Project Page on SF (where you can get to screenshots, subversion browsing etc) is here:
http://sourceforge.net/projects/symba/
We changed the name of the project to reflect the wider diversity of developers now contributing to the project. I will be sending information, announcements, and answers to questions on the symba developers mailing list, which everyone can subscribe to:
symba-devel@lists.sourceforge.net
If you wish to subscribe, please go here:
http://lists.sourceforge.net/mailman/listinfo/symba-devel
We'd be happy to have additional developers on the project, and if there is any feature or bug you'd like to report, please use our issue trackers:
http://sourceforge.net/tracker/?group_id=202680
If you'd like to take a more hands-on approach, then please email me your sourceforge user id, and I'll add you as a developer on the project.
SyMBA was initially developed (and is still mainly developed) by the Integrative Bioinformatics Group, headed by Neil Wipat and part of CISBAN. Many thanks to all who have helped, via code or comment, and also to the current SourceForge Developers listed below:
Allyson Lister (CISBAN)
Olly Shaw (CISBAN)
Dan Swan (Newcastle Bioinformatics Support Unit)
Frank Gibson (CARMEN Neuroscience Project)
SyMBA is also being evaluated by other members of the CARMEN project and by CSBE.
The sandbox to play around with SyMBA is now up again at http://bsu.ncl.ac.uk:8081/symba after a major disk malfunction on the old server. I'll transfer all old logins in the old system now having logins on the new one, and if you'd like a login once the new server is up, please drop me a line. In the meantime, please have a look around the new SourceForge site and also the code, if you like! All comments and suggestions welcome.
Some general information:
The Centre for Integrated Systems Biology of Ageing and Nutrition has developed a data archive and
integrator
(SyMBA) based on Milestone 3 of the
Functional Genomics Experiment (FuGE)
Object Model (FuGE-OM), and which archives, stores, and retrieves raw high-throughput data. Until now,
few
published systems have successfully integrated multiple omics data types and information about
experiments
in a single database. SyMBA is the first published implementation of FuGE that includes a database
back-end,
expert and standard interfaces, and a Life Science Identifier (LSID) Resolution and Assigning service to
identify objects and provide programmatic access to the database. Having a central data repository
prevents
deletion, loss, or accidental modification of primary data, while giving convenient access to the data
for
publication and analysis. It also provides a central location for storage of metadata for the
high-throughput data sets, and will facilitate subsequent data integration strategies.
The All Hands Meeting is underway in Nottingham this week, and Frank Gibson has started putting up some posts on it. So take a look every so often to see what's going on there with CARMEN and the many other projects Newcastle University and others are discussing.
Other than where specified, these are my notes from the IB07 Conference, and not expressions of opinion. Any errors are probably just due to my own misunderstanding. :)
Talk about multi-value networks, high-level petri nets, and the differences with boolean networks. Formal methods are required to model and analyse complex regulatory interactions. Boolean networks offer a good starting point, but are often too simplistic. Multi-value networks (MVNs) are qualitative, and are seen as a middle ground between differential equation models and boolean networks.
He has applied high-level petri net techniques and a wide range of analysis tools. In MVNs, entities assume a range of values (o...n). Each entity has a neighbourhood of other entities that affect it, and the behaviour of each entity is described using state tables. However, we can't really analyse this: that's where Petri nets come in. They have a graphical notation with mathematical semantics and can model choice, synchronization and concurrency. They have an expressive framework with data types and equational description of behaviour. There are a wide range of analysis techniques and tool support, e.g. model checking. Petri nets use a kind of tokenizing system.
Their approach was as follows. They have defined a set of state transition tables that completely define the model. Equational definitions are extracted from these tables, and then a Petri net is constructed. They also use multi-value logic minimalization applied to each state transition table to simplify the information from the tables. Construction of the high-level Petri net begins with a single place for each entity connected to central transition. Transition encodes equational specification of network behaviour. Each placed "x" is connected to the transition node with input arch "x and output arc x".
They showed how this worked through carbon starvation in E.coli. Exponential growth occurs where there is sufficient carbon, but they enter a stationary phase when the carbon is depleted. The model is validated by checking known properties. Then, you can look at dynamic properties. A mutant analysis was also done, where you can "knockout" or overexpress key genes and observe the effect.
Finally, they do a model comparison with the Boolean network equivalent of this model. There are differences, which leads to some interesting questions: how much detail is required in the model? Is the model representable in the boolean domain?
My opinion: A great, interesting talk that flowed well and was easy to understand. Slides were a little overfull, but it didn't detract. A natural speaker.
Integration of constraints documented in SBML, SBO, and the SBML Manual facilitates validation of biological models
Published September 2007 by the Journal of Integrative Bioinformatics
Allyson L. Lister1,2, Matthew Pocock2, Anil Wipat1,2,*
1 Centre for Integrated Systems Biology of Ageing and Nutrition (http://www.cisban.ac.uk)
2 School of Computing Science (http://www.cs.ncl.ac.uk),
Newcastle University (http://www.ncl.ac.uk)*
Abstract
The creation of quantitative, simulatable, Systems Biology Markup Language (SBML) models that accurately simulate the system under study is a time-intensive manual process that requires careful checking. Currently, the rules and constraints of model creation, curation, and annotation are distributed over at least three separate documents: the SBML schema document (XSD), the Systems Biology Ontology (SBO), and the “Structures and Facilities for Model Definition” document. The latter document contains the richest set of constraints on models, and yet it is not amenable to computational processing. We have developed a Web Ontology Language (OWL) knowledge base that integrates these three structure documents, and that contains a representative sample of the information contained within them. This Model Format OWL (MFO) performs both structural and constraint integration and can be reasoned over and validated. SBML Models are represented as individuals of OWL classes, resulting in a single computationally amenable resource for model checking. Knowledge that was only accessible to humans is now explicitly and directly available for computational approaches. The integration of all structural knowledge for SBML models into a single resource creates a new style of model development and checking.
Introduction
Systems Biology Markup Language[1] (SBML) is an XML format that has emerged as the de facto standard file format for describing computational models in systems biology. It is supported by a vibrant community who have developed a wide range of tools, allowing models to be generated, analysed and curated in any one of many independently maintained software applications[1]. The Systems Biology Ontology[2][2] (SBO) was developed to enable a useful understanding of the biology to which a model relates, and to provide well-understood terms for describing common modelling concepts. The community is engaged in an on-going effort to develop the SBML standard in ways needed to support systems biology applications. As part of this process, a manual is maintained that describes and defines SBML and SBO[3].
The biological knowledge used to create and annotate a high-quality SBML model is typically analysed and integrated by a researcher. These modellers know and understand both the systems they are modelling and the intricacies of SBML. However, as with most areas of biology, the amount of data that is relevant to generating even a relatively small and well-scoped model is overwhelming. In order to extend the range of modelling tasks that can be automated, it is necessary to both capture the salient biological knowledge in a form that computers can process, and represent the SBML rules in a way computers can systematically interpret. Here we address the latter issue: describing SBML, SBO and the rules about what constitutes a correctly formed model in a way suitable for computational manipulation.
The Semantic Web[4] can be seen as today’s incarnation of the goal to allow computers to go beyond performing numerical computations, and to share and integrate information more easily. There are now several standards forming within the Semantic Web community that together formalise computational languages for representing knowledge and strictly define what conclusions can be reached from facts expressed in these languages. The Web Ontology Language[3][5] (OWL) is one such language that enjoys strong tools support and which is used for capturing biological and medical knowledge (e.g. OBI[6], BioPax[7], EXPO[4], and FMA[5] and GALEN[6] in OWL). Once the information about the domain has been modelled in an OWL file, a software application called a reasoner[7, 8] can automatically deduce all other facts that must logically follow as well as find inconsistencies between asserted facts.
The knowledge about a system described in SBML can be divided into two parts. Firstly, there is the biological knowledge. This includes information about the biological entities involved and their biological. Secondly, there is the structural knowledge, describing how the biological knowledge must be captured in well-formed documents suitable for processing by applications. In the case of a high-quality SBML model, the structural knowledge required to create such a model is tied up in three main locations:
- The Systems Biology Markup Language (SBML[1][8]) XML Schema Document (XSD[9]), describing the range of XML documents considered to be in SBML syntax,
- The Systems Biology Ontology (SBO[2][10]), describing the range of terms that can be used to describe parts of the model in a way understandable to the community using the Open Biological Ontologies (OBO[11]) format, and
- The "Structures and Facilities for Model Definition" document[12] (hereafter referred to as the "SBML Manual"), describing many additional restrictions and constraints upon SBML documents, and the context within which SBO terms can be used, as well as information about how conformant documents should be interpreted.
From a knowledge-engineering point of view, it makes sense to represent these sources of structural knowledge as part of a single knowledge base. Although, to a knowledge-engineer, this current separation of documents could appear arbitrary, it is in fact well-motivated according to consumers of each type of information. The portion of the knowledge codified in SBML transmits all of and only the information needed to parameterise and run a computational simulation of the system. The knowledge in SBO is intended to aid humans in understanding what is being modelled. The SBML Manual is aimed at tools developers needing to ensure that software developed is fully compliant with the specification.
Only two of these three sources of structural knowledge are directly computationally amenable. SBML has an associated XSD that describes the range of legal XML documents, which elements and attributes must appear, and constraints on the values of text within the file. SBO captures a term hierarchy containing human-readable descriptions and labels for each term and a machine-readable ID for each term. Neither of these documents contains much information about how XML elements or SBO terms should be used in practice, how the two interact, or what a particular conformant SBML document should mean to an end-user. The majority of information required to develop a format-compliant model is in the SBML Manual, in formal English. Anything more than simple programmatic steps, such as XML validation, can currently only be done by manually encoding the English descriptions in the SBML Manual into rules in a program. libSBML[13] is the reference implementation of this procedure, capturing the process of validating constraints. Manual encoding provides scope for misinterpretation of the intent of the SBO Manual or may produce code that accepts or generates non-compliant documents due to silent bugs. In practice, these problems are ameliorated by regular SBML Hackathons[14] and the use of libSBML by many SBML applications. However, the need for a more formal and complete description of the information in the SBML Manual becomes more pressing as the community grows beyond the point where all of the relevant developer groups can be adequately served by face-to-face meetings.
We find that some of these issues can be avoided by combining the structural knowledge currently spread across three documents in three formats into a single computationally amenable resource. This method of constraint integration for all information pertinent to SBML will require a degree of rigour that can only improve the clarity of the specification. Once established, standard OWL tools can be used to validate and reason over SBML models, to check their conformance and to derive any conclusions that follow from the facts stated in the document, all without manual intervention.
To address this proposition, we have developed the Model Format OWL (MFO), implemented in OWL-DL and capturing the SBML structure plus a representative sample of SBO and human-readable constraints from the SMBL Manual. We demonstrate that MFO is capable of directly capturing many of the structural rules and semantic constraints documented in the SBML Manual. The mapping between SBML documents and the OWL representation is bi-directional: information can be parsed as OWL individuals from an SBML document, manipulated and studied, and then serialized back out again as SBML. We demonstrate feasibility with two simple, illustrative, examples. In future, we hope to use this as the basis for a method of automatically improving the annotation of SBML models with rich biological knowledge, and as an aid to principled automated model improvement and merging.
The integration of all structural knowledge for SBML models into a single resource creates a new style of model document development, which we believe will greatly reduce the overheads associated with computational transformations between biological knowledge and high-quality systems biology models. MFO is not intended to be a replacement for any of the APIs or software programs available to the SBML community today. It addresses the very specific need of a sub-community within SBML that wishes to be able to express their models in OWL for the purpose of reasoning, validation, and querying. It has also been created as the first step in a larger data integration strategy that will eventually encompass the biological as well as structural knowledge present in SBML documentation and models.
[1] Hucka, M. et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics (Oxford, England) 19 (2003) 524-531
[2] Le Novere, N.: Model storage, exchange and integration. BMC Neurosci 7 Suppl 1 (2006) S11
[3] Horrocks, I., Patel-Schneider, P.F., van Harmelen, F.: From SHIQ and RDF to OWL: The making of a web ontology language. J. of Web Semantics 1 (2003) 7-26
[4] Soldatova, L.N., King, R.D.: An ontology of scientific experiments. Journal of the Royal Society, Interface / the Royal Society 3 (2006) 795-803
[5] Heja, G., Varga, P., Pallinger, P., Surjan, G.: Restructuring the foundational model of anatomy. Studies in health technology and informatics 124 (2006) 755-760
[6] Heja, G., Surjan, G., Lukacsy, G., Pallinger, P., Gergely, M.: GALEN based formal representation of ICD10. International journal of medical informatics 76 (2007) 118-123
Enjoyed this? To read the rest, please see the Journal of Integrative Bioinformatics
A Technical Report for the School of Computing Science of Newcastle University was released last month describing the CISBAN DPI, an implementation of the FuGE Milestone 3 STK. You can find and download that technical report here:
http://www.cs.ncl.ac.uk/research/pubs/trs/abstract.php?number=1016
The Abstract follows:
The Centre for Integrated Systems Biology of Ageing and Nutrition has developed a Data Portal and Integrator (CISBAN DPI) that is based on the FuGE Object Model and which archives, stores, and retrieves raw high-throughput data. Until now, few published systems have successfully integrated multiple omics data types and information about experiments in a single database. The CISBAN DPI is the first published implementation of FuGE that includes a database back-end, expert and standard interfaces, and utilizes a Life Science Identifier (LSID) Resolution and Assigning service to identify objects and provide programmatic access to the database. Having a central data repository prevents deletion, loss, or accidental modification of primary data, while giving convenient access to the data for publication and analysis. It also provides a central location for storage of metadata for the high-throughput data sets, and will facilitate subsequent data integration strategies.
Keywords
Functional Genomics, High-Throughput Experiments, FuGE, LSID, Experimental Workflows, Databases, Data Standards, Data Sharing, Metadata, Data Integration.
CS-TR: 1016 Implementing the FuGE Object Model: a Systems Biology Data Portal and Integrator,
Lister, A. L., Jones, A. R., Pocock, M., Shaw, O., Wipat, A.
School of Computing Science, Newcastle University, Apr 2007