memonic

About Fluidinfo

Save

Fluidinfo is an online information storage and search platform. Its design supports shared openly-writable metadata of any type and about anything, provides modern writable APIs, and allows data to be branded with domain names.

Read-only information storage is a major disadvantage for business in a world increasingly driven by lightweight social applications. In the natural world we routinely put metadata where it is most useful, e.g., bookmarks in books, post-it notes in specific locations, and name tags around necks at conferences. But in the digital world businesses and consumers usually cannot store metadata in its most useful location—in context—because traditional storage is not openly writable. Potential value is reduced when data is in read-only silos as related information cannot be combined or searched across. Flexibility and spontaneity are restricted when future needs must be anticipated or write permission must first be obtained.

The unique properties of Fluidinfo change this situation completely

Fluidinfo provides a universal metadata engine because it has an object for everything imaginable, just like Wikipedia has a web page for everything. Fluidinfo objects can always have data added to them by any user or application, so related metadata can be stored in the same place. This allows it to be combined in searches, and remixed to great effect, increasing its value over metadata held in isolated databases. Fluidinfo allows information owners to put their internet domain names onto data and has a simple, flexible, and powerful permissions system.

Humans are diverse and unpredictable. We create, share, and organize information in an infinite variety of ways. We've even built machines to process it. Yet for all their capacity and speed, using computers to work with information is often awkward and frustrating. We are allowed very little of the spontaneity that characterizes normal human behavior. Our needs must be anticipated in advance by programmers. Far too often we can look, but not touch.

Why isn't it easier to work with information using a computer?

At Fluidinfo we believe the answer lies in information architecture. A rigid underlying platform inhibits or prevents spontaneity. A new information architecture can be the basis for a new class of applications. It can provide freedom and flexibility to all applications, and these advantages could be passed on to users.

We've spent the last several years designing and building Fluidinfo to be just such an architecture.

Fluidinfo makes it possible for data to be social. It allows almost unlimited information personalization by individual users and applications, and also between them. This makes it simple to build a wide variety of applications that benefit from cooperation, and which are open to unanticipated future enhancements. Even more importantly, Fluidinfo facilitates and encourages the growth of applications that leave users in control of their own data.

Introducing fise, the Open Source RESTful Semantic Engine

Save

Aug 30, 2010


Fise_logoAs a member of the IKS european project Nuxeo contributes to the development of an Open Source software project named
fise whose goal is to help bring new and trendy semantic features to CMS by giving developers a stack of reusable HTTP semantic services to build upon.

As such concepts might be new to some readers, the first part of this blog post is presented as a QA. 

A semantic engine is a software component that extracts the meaning of a electronic document to organize it as partially structured knowledge and not just as a piece of unstructured text content.

Current semantic engines can typically:

  • categorize documents (is this document written in English, Spanish, Chinese? is this an article that should be filed under the  Business, Lifestyle, Technology categories? ...);
  • suggest meaningful tags from a controlled taxonomy and assert there relative importance with respect to the text content of the document;
  • find related documents in the local database or on the web;
  • extract and recognize mentions of known entities such as famous people, organizations, places, books, movies, genes, ... and link the document to there knowledge base entries (like a biography for a famous person);
  • detect yet unknown entities of the same afore mentioned types to enrich the knowledge base;
  • extract knowledge assertions that are present in the text to fill up a knowledge base along with a reference to trace the origin of the assertion. Examples of such assertions could be the fact that a company is buying another along with the amount of the transaction, the release date of a movie, the new club of a football player...

During the last couple of years, many such engines have been made available through web-based API such as Open Calais, Zemanta and Evri just to name a few. However to our knowledge there aren't many such engines distributed under an Open Source license to be used offline, on your private IT infrastructure with your sensitive data.

Lod-datasets_2009-03-27_coloredLinking content items to semantic entities and topics that are defined in open universal databases (such as DBpedia, freebase or the NY Times database) allows for many content driven applications like online websites or private intranets to share a common conceptual frame and improve findability and interoperability.

Publishers can leverage such technologies to build automatically updated entity hubs that aggregate resources of different types (documents, calendar events, persons, organizations, ...) that are related to a given semantic entity identified by an disambiguated universal identifiers that span all applications.

I you are not yet convinced please have look at this BBC use case and this 3 minutes video by the fine freebase folks.

You can test fise using the online demo or you can download a snapshot of the all-in-one executable jar launcher (67MB) or you can build your own instance from source. If you want to run your local instance just launch it with a java 6 virtual machine as follows:

 java -Xmx512M -jar eu.iksproject.fise.launchers.sling-0.9-20100802.jar

And point your browser to http://localhost:8080 instead of http://fise.demo.nuxeo.com in the following examples.

Once the server is up and running, fise offers three HTTP endpoints: the engines, the store and the sparql endpoint:

  • the /engines endpoint all the user to analyse English text content and send back the results of the analysis without storing anything on the server: this is stateless HTTP service
  • the /store endpoint does the same analysis but furthermore store the results on the fise server: this a stateful HTTP service. Analysis results are then available for later browsing.
  • the /sparql endpoint provide a machine level access to perform complex graph queries the enhancements extracted on content items sent to the /store endpoint.

Let us focus on the /engines endpoint. The view first list the active registered analysis components and then ask for a user input. Type an English sentence that mentions famous or non famous people, organizations and places such as countries and cities. I your are lazy, just copy and paste some article from a public news feed such as wikinews and submit your content with "Run engines". Depending on the registered engines and the length of your content, the processing time will typically vary from less than one second to around a minute.

Submitting text content to the /engines endpoint using the web interface

Submitting text content to the /engines endpoint using the web interface

 

By default fise launches three engines in turns:

  • the first engine performs named entity detection using the OpenNLP library: it will try find occurrences of names of people, places and organizations based on a statistical model of the structure of English sentences
  • the second engine tries to recognize the previously detected entities using a local Apache Lucene index of the top 10,000 most famous entities from DBpedia. This index is configurable and will be improved in future versions of fise.
  • the last engine then asynchronously fetches additional data from DBpedia such as the GPS coordinates of places, thumbnails and Wikipedia descriptions of the recognized entities. Fetched entities are cached in the fise store for faster lookup next time the entity is recognized. A summary of those informations are then display the results in the fise UI as columns of entities and a word map display the locations:

Overview of the extracted entities in the submitted text

Overview of the extracted entities in the submitted text.

Up until now we have used the web user interface for human beings who want to test the capabilities of the engines manually and navigate through the results using there browser. This is primarily a demo mode.

The second way to use fise is the RESTful API for machines (e.g. third party ECM applications such as Nuxeo DM and Nuxeo DAM) that will use fise as an HTTP service to enhance the content of there documents. The detailed documentation if the REST API is available on a per-endpoint basis in the Web UI by clicking  on the "REST API" link in the top right corner of the page:

Rest-api-link

Accessing the inline documentation for the REST API

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
 --data "Fise can detect famous cities such as Paris." \
 http://fise.demo.nuxeo.com/engines/

Right now the packaged engines can only deal with English text content. We plan to progressively add statistical models for other languages as well.

Right now if you submit a sentence that starts with "United Kingdom prime minister David Cameron declared to the press..." you will get an output such as:

David-cameron

"David Cameron" is detected as a person but not recognized since the fise index was built on a DBpedia dump extracted before his election. Furthermore fise is currently not able to extract the relation between the entity "David Cameron" and the entity "United Kingdom". In future versions of fise we plan to extract the role "prime minister" that links the person to the country. This should be achievable by combining syntactic parsing with semantic alignment of english words with an ontology such as DBpedia.

Extracting relations between entities will help knowledge workers incrementally build large knowledge bases at a low cost. For instance, this can be very interesting for economic intelligence or data-driven journalism: imagine automatically building the social networks of public figures from news feed and their relationships with business entities such as companies and financial institutions for instance.

Right now fise is a standalone HTTP service with a basic web interface mainly used for demo purposes. To make it really useful some work is needed to integrate it with the Nuxeo platform so that Nuxeo DM, Nuxeo DAM and Nuxeo CMF users will benefit from a seamless semantic experience.

 

Hey, by the way, Nuxeo is recruiting. We're looking for skilled, motivated and easy to work with Java Developers (junior, senior, or even interns, as long as you're willing to keep growing your skill set with the rest of the team). So check out these job descriptions and apply (write to jobs(at)nuxeo.com) if you're interested.

 

Introducing fise, the Open Source RESTful Semantic Engine


Fise_logoAs a member of the IKS european project Nuxeo contributes to the development of an Open Source software project named
fise whose goal is to help bring new and trendy semantic features to CMS by giving developers a stack of reusable HTTP semantic services to build upon.

As such concepts might be new to some readers, the first part of this blog post is presented as a QA. 

A semantic engine is a software component that extracts the meaning of a electronic document to organize it as partially structured knowledge and not just as a piece of unstructured text content.

Current semantic engines can typically:

  • categorize documents (is this document written in English, Spanish, Chinese? is this an article that should be filed under the  Business, Lifestyle, Technology categories? ...);
  • suggest meaningful tags from a controlled taxonomy and assert there relative importance with respect to the text content of the document;
  • find related documents in the local database or on the web;
  • extract and recognize mentions of known entities such as famous people, organizations, places, books, movies, genes, ... and link the document to there knowledge base entries (like a biography for a famous person);
  • detect yet unknown entities of the same afore mentioned types to enrich the knowledge base;
  • extract knowledge assertions that are present in the text to fill up a knowledge base along with a reference to trace the origin of the assertion. Examples of such assertions could be the fact that a company is buying another along with the amount of the transaction, the release date of a movie, the new club of a football player...

During the last couple of years, many such engines have been made available through web-based API such as Open Calais, Zemanta and Evri just to name a few. However to our knowledge there aren't many such engines distributed under an Open Source license to be used offline, on your private IT infrastructure with your sensitive data.

Lod-datasets_2009-03-27_coloredLinking content items to semantic entities and topics that are defined in open universal databases (such as DBpedia, freebase or the NY Times database) allows for many content driven applications like online websites or private intranets to share a common conceptual frame and improve findability and interoperability.

Publishers can leverage such technologies to build automatically updated entity hubs that aggregate resources of different types (documents, calendar events, persons, organizations, ...) that are related to a given semantic entity identified by an disambiguated universal identifiers that span all applications.

I you are not yet convinced please have look at this BBC use case and this 3 minutes video by the fine freebase folks.

You can test fise using the online demo or you can download a snapshot of the all-in-one executable jar launcher (67MB) or you can build your own instance from source. If you want to run your local instance just launch it with a java 6 virtual machine as follows:

 java -Xmx512M -jar eu.iksproject.fise.launchers.sling-0.9-20100802.jar

And point your browser to http://localhost:8080 instead of http://fise.demo.nuxeo.com in the following examples.

Once the server is up and running, fise offers three HTTP endpoints: the engines, the store and the sparql endpoint:

  • the /engines endpoint all the user to analyse English text content and send back the results of the analysis without storing anything on the server: this is stateless HTTP service
  • the /store endpoint does the same analysis but furthermore store the results on the fise server: this a stateful HTTP service. Analysis results are then available for later browsing.
  • the /sparql endpoint provide a machine level access to perform complex graph queries the enhancements extracted on content items sent to the /store endpoint.

Let us focus on the /engines endpoint. The view first list the active registered analysis components and then ask for a user input. Type an English sentence that mentions famous or non famous people, organizations and places such as countries and cities. I your are lazy, just copy and paste some article from a public news feed such as wikinews and submit your content with "Run engines". Depending on the registered engines and the length of your content, the processing time will typically vary from less than one second to around a minute.

Submitting text content to the /engines endpoint using the web interface

Submitting text content to the /engines endpoint using the web interface

 

By default fise launches three engines in turns:

  • the first engine performs named entity detection using the OpenNLP library: it will try find occurrences of names of people, places and organizations based on a statistical model of the structure of English sentences
  • the second engine tries to recognize the previously detected entities using a local Apache Lucene index of the top 10,000 most famous entities from DBpedia. This index is configurable and will be improved in future versions of fise.
  • the last engine then asynchronously fetches additional data from DBpedia such as the GPS coordinates of places, thumbnails and Wikipedia descriptions of the recognized entities. Fetched entities are cached in the fise store for faster lookup next time the entity is recognized. A summary of those informations are then display the results in the fise UI as columns of entities and a word map display the locations:

Overview of the extracted entities in the submitted text

Overview of the extracted entities in the submitted text.

Up until now we have used the web user interface for human beings who want to test the capabilities of the engines manually and navigate through the results using there browser. This is primarily a demo mode.

The second way to use fise is the RESTful API for machines (e.g. third party ECM applications such as Nuxeo DM and Nuxeo DAM) that will use fise as an HTTP service to enhance the content of there documents. The detailed documentation if the REST API is available on a per-endpoint basis in the Web UI by clicking  on the "REST API" link in the top right corner of the page:

Rest-api-link

Accessing the inline documentation for the REST API

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
 --data "Fise can detect famous cities such as Paris." \
 http://fise.demo.nuxeo.com/engines/

Right now the packaged engines can only deal with English text content. We plan to progressively add statistical models for other languages as well.

Right now if you submit a sentence that starts with "United Kingdom prime minister David Cameron declared to the press..." you will get an output such as:

David-cameron

"David Cameron" is detected as a person but not recognized since the fise index was built on a DBpedia dump extracted before his election. Furthermore fise is currently not able to extract the relation between the entity "David Cameron" and the entity "United Kingdom". In future versions of fise we plan to extract the role "prime minister" that links the person to the country. This should be achievable by combining syntactic parsing with semantic alignment of english words with an ontology such as DBpedia.

Extracting relations between entities will help knowledge workers incrementally build large knowledge bases at a low cost. For instance, this can be very interesting for economic intelligence or data-driven journalism: imagine automatically building the social networks of public figures from news feed and their relationships with business entities such as companies and financial institutions for instance.

Right now fise is a standalone HTTP service with a basic web interface mainly used for demo purposes. To make it really useful some work is needed to integrate it with the Nuxeo platform so that Nuxeo DM, Nuxeo DAM and Nuxeo CMF users will benefit from a seamless semantic experience.

We're the friendly employees of Nuxeo, a leading open source software vendor, which develops a complete Enterprise Content Management (ECM) software platform and Document Management solutions to help companies better produce, process, publish, archive, expose and find their information from digital assets to transactional documents.

» Follow us @nuxeo (Twitter)

» Connect on LinkedIn

» Visit Nuxeo.com

 

 
 

We're the friendly employees of Nuxeo, a leading open source software vendor, which develops a complete Enterprise Content Management (ECM) software platform and Document Management solutions to help companies better produce, process, publish, archive, expose and find their information from digital assets to transactional documents.

» Follow us @nuxeo (Twitter)

» Connect on LinkedIn

» Visit Nuxeo.com

 

Customize Configure
Nuxeo • Studio

Nuxeo • DM
Online Trial

Nuxeo • DM
Download

Nuxeo • DAM
Download

Nuxeo Connect support

 

BBC World Cup 2010 dynamic semantic publishing

Save

BBC World Cup 2010 dynamic semantic publishing

Post categories:

Jem Rayfield | 10:00 UK time, Monday, 12 July 2010

The World Cup 2010 website is a significant step change in the way that content is published. From first using the site, the most striking changes are the horizontal navigation and the larger, format high-quality video. As you navigate through the site it becomes apparent that this is a far deeper and richer use of content than can be achieved through traditional CMS-driven publishing solutions.

The site features 700-plus team, group and player pages, which are powered by a high-performance dynamic semantic publishing framework. This framework facilitates the publication of automated metadata-driven web pages that are light-touch, requiring minimal journalistic management, as they automatically aggregate and render links to relevant stories.

eng_595.jpg

Dynamic aggregation examples include:

The underlying publishing framework does not author content directly; rather it publishes data about the content - metadata. The published metadata describes the world cup content at a fairly low-level of granularity, providing rich content relationships and semantic navigation. By querying this published metadata we are able to create dynamic page aggregations for teams, groups and players.

The foundation of these dynamic aggregations is a rich ontological domain model. The ontology describes entity existence, groups and relationships between the things/concepts that describe the World Cup. For example, "Frank Lampard" is part of the "England Squad" and the "England Squad" competes in "Group C" of the "FIFA World Cup 2010".

The ontology also describes journalist-authored assets (stories, blogs, profiles, images, video and statistics) and enables them to be associated to concepts within the domain model. Thus a story with an "England Squad" concept relationship provides the basis for a dynamic query aggregation for the England Squad page "All stories tagged with England Squad".

This diagram gives a high-level overview of the main architectural components of this domain-driven, dynamic rendering framework.

diagram_595.png

The journalists use a web tool, called 'Graffiti', for the selective association - or tagging - of concepts to content. For example, a journalist may associate the concept "Frank Lampard" with the story "Goal re-ignites technology row".

In addition to the manual selective tagging process, journalist-authored content is automatically analysed against the World Cup ontology. A natural language and ontological determiner process automatically extracts World Cup concepts embedded within a textual representation of a story. The concepts are moderated and, again, selectively applied before publication. Moderated, automated concept analysis improves the depth, breadth and quality of metadata publishing.

Journalist-published metadata is captured and made persistent for querying using the resource description framework (RDF) metadata representation and triple store technology. A RDF triplestore and SPARQL approach was chosen over and above traditional relational database technologies due to the requirements for interpretation of metadata with respect to an ontological domain model. The high level goal is that the domain ontology allows for intelligent mapping of journalist assets to concepts and queries. The chosen triplestore provides reasoning following the forward-chaining model and thus implied inferred statements are automatically derived from the explicitly applied journalist metadata concepts. For example, if a journalist selects and applies the single concept "Frank Lampard", then the framework infers and applies concepts such as "England Squad", "Group C" and "FIFA World Cup 2010" (as generated triples within the triple store).

This inference capability makes both the journalist tagging and the triple store powered SPARQL queries simpler and indeed quicker than a traditional SQL approach. Dynamic aggregations based on inferred statements increase the quality and breadth of content across the site. The RDF triple approach also facilitates agile modeling, whereas traditional relational schema modeling is less flexible and also increases query complexity.


Our triple store is deployed multi-data center in a resilient, clustered, performant and horizontally scalable fashion, allowing future expansion for additional ontologies and indeed linked open data (LOD) sets.

The triple store is abstracted via a JAVA/Spring/CXF JSR 311 compliant REST service. The REST API is accessible via HTTPs with an appropriate certificate. The API is designed as a generic façade onto the triplestore allowing RDF data to be re-purposed and re-used pan BBC. This service orchestrates SPARQL queries and ensures that results are dynamically cached with a low 'time-to-live' (TTL) (1 minute) expiry cross data center using memcached.

All RDF metadata transactions sent to the API for CRUD operations are validated against associated ontologies before any persistence operations are invoked. This validation process ensures that RDF conforms to underlying ontologies and ensures data consistency. The validation libraries used include Jena Eyeball. The API also performs content transformations between the various flavors of RDF such as N3 or XML RDF. Example RDF views on the data include:

Automated XML sports stats feeds from various sources are delivered and processed by the BBC. These feeds are now also transformed into an RDF representation. The transformation process maps feed supplier ids onto corresponding ontology concepts and thus aligns external provider data with the RDF ontology representation with the triple store. Sports stats for matches, teams and players are aggregated inline and served dynamically from the persistent triple store.

The following "Frank Lampard" player page includes dynamic sports stats data served via SPARQL queries from the persistent triple store:

frank_595.jpg


The dynamic aggregation and publishing page-rendering layer is built using a Zend PHP and memcached stack. The PHP layer requests an RDF representation of a particular concept or concepts from the REST service layer based on the audience's URL request. If an "England Squad" page request is received by the PHP code several RDF queries will be invoked over HTTPs to the REST service layer below.

The render layer will then dynamically aggregate several asset types (stories, blogs, feeds, images, profiles and statistics) for a particular concept such as "England Squad". The resultant view and RDF is cached with a low TTL (1 minute) at the render layer for subsequent requests from the audience. The PHP layer dynamically renders views based on HTTP headers providing content negotiated HTML and/or RDF for each and every page.

To make use of the significant number of existing static news kit and architecture (apache servers, HTTP load balancers and gateway architecture) all HTTP responses are annotated with appropriate low (1 minute) cache expires headers. This HTTP caching increases the scalability of the platform and also allows content delivery network caching (CDN) if demand requires.

This dynamic semantic publishing architecture has been serving millions of page requests a day throughout the World Cup with continually changing OWL reasoned semantic RDF data. The platform currently serves an average of a million SPARQL queries a day with a peak RDF transaction rate of 100s of player statistics per minute. Cache expiry at all layers within the framework is 1 minute proving a dynamic, rapidly changing domain and statistic-driven user experience.

The development of this new high-performance dynamic semantic publishing stack is a great innovation for the BBC as we are the first to use this technology on such a high-profile site. It also puts us at the cutting edge of development for the next phase of the Internet, Web 3.0.

So what's next for the platform after the World Cup? There are many engaging expansion possibilities: such as extending the World Cup approach throughout the sport site; making BBC assets geographically 'aware' is another possibility; as is aligning news stories to BBC programs. This is all still to be decided, but one thing we are certain of is that this technological approach will play a key role in the creation, navigation and management of over 12,000 athletes and index pages for the London 2012 Olympics.


Jem Rayfield is Senior Technical Architect, BBC News and Knowledge. Read the previous post on the Internet blog that covers the BBC World Cup website, The World Cup and a call to action around Linked Data.


Metadata is data about data - it describes other data. In this instance, it provides information about the content of a digital asset. For example, a World Cup story may include metadata that describes which football players are mentioned within the text of a story. The metadata may also describe the associated team, group or organization associated to the story.

IBM LanguageWare Language and ontological linguistic platform.

RDF is based upon the idea of making statements about concepts/resources in the form of subject-predicate-object expressions. These expressions are known as triples in RDF terminology. The subject denotes the resource; and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, to represent the notion "Frank Lampard plays for England" in RDF is as a triple, the subject is "Frank Lampard"; the predicate is "plays for" and the object is "England Squad".

SPARQL (pronounced "sparkle") is an RDF query language its name is a recursive acronym (i.e. an acronym that refers to itself) that stands for SPARQL Protocol and RDF Query Language.

BigOWLIM A high performance, scalable, resilient triplestore with robust OWL reasoning support

LOD The term Linked Open Data is used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web.

JAVA Object-orientated programming language developed by Sun Microsystems.

Spring Rich JAVA framework for managing POJOs providing facilities such as inversion of control (ioc) and aspect orientated programming

Apache CXF JAVA Web services framework for JAX-WS and JAR-RS

JSR 311 Java standard specification API for RESTful web services.

Memcached Distributed memory caching system (deployed multi datacenter)

Jena Eyeball JAVA RDF validation library for checking ontological issues with RDF

N3 Shorthand textual representation of RDF designed with human readability in mind.

XML RDF XML representation of an RDF graph.

XML (Extensible Markup Language) is a set of rules for encoding documents and data in machine-readable form

Zend Open source scripting virtual machine for PHP, facilitating common programming patterns such as model view controller.

PHP Hypertext Preprocessor general-purpose dynamic web scripting language, use to create dynamic web pages.

CDN A content delivery network or content distribution network (CDN) is a collection of computers usually hosted within Internet Service Provider hosting facilities. The CDN servers cache local copies of content to maximize bandwidth and reduce requests to origin servers.

OWL Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies.

    Comments (1)

    Patrice Neff

    Patrice Neff Jul 13, 2010

    I like. I've been playing with semantic data and really like to hear that it can be used for real world scenarios. And the backend infrastructure with REST services, caching, etc. sounds similar to what we do at Memonic.

    The Neo4j REST Server - Part1: Get it going!

    Save
    Introduction
    As requested and wished by many, finally Neo4j got its own standalone server mode, based on interaction via REST. The code is still very fresh and not thoroughly tested, but I thought I might write up some first documentation on it, based on the Getting Started with REST Wiki page

    Installation

    The first version of the distribution can be downloaded from here: zip, tar.gz. After unpacking, you just go to the unpacked directory and run (on OSX/Linux - see the wiki entry for details on Windows)
    $ ./bin/neo4j-rest start
    which will start the Neo4j REST server at port 9999 and put the database files under a directory neo4j-rest-db/ (lazily with the first request). Now, let's point our browser (not Internet Explorer since it doesn't send any useful Accept-headers and will get JSON back, this will be fixed later) to http://localhost:9999 and we will see the following:



    Things seem to be running! The reason for the HTML interface is the Browser sending Accept: text/html. Now, setting the Accept to application/json will produce
    peterneubauer$ curl -H Accept:application/json -H Content-Type:application/json -v http://localhost:9999
    * About to connect() to localhost port 9999 (#0)
    *   Trying 127.0.0.1... connected
    * Connected to localhost (127.0.0.1) port 9999 (#0)
    > GET / HTTP/1.1
    > User-Agent: curl/7.19.7 (i386-apple-darwin10.2.0) libcurl/7.19.7 zlib/1.2.3
    > Host: localhost:9999
    > Accept:application/json
    > Content-Type:application/json
    * Connection #0 to host localhost left intact
    * Closing connection #0
    {
      "reference node":"http://localhost:9999/node/0"
    }

    Now, with "200 OK" this is a good starting point. We can see full references to the interesting starting points -the reference node and the index subsystem. Let's check out the reference node:
    peterneubauer$ curl -H Accept:application/json -H Content-Type:application/json -v http://localhost:9999/node/0
    * About to connect() to localhost port 9999 (#0)
    *   Trying 127.0.0.1... connected
    * Connected to localhost (127.0.0.1) port 9999 (#0)
    > GET /node/0 HTTP/1.1
    > User-Agent: curl/7.19.7 (i386-apple-darwin10.2.0) libcurl/7.19.7 zlib/1.2.3
    > Host: localhost:9999
    > Accept:application/json
    > Content-Type:application/json
    >
    {
    "incoming typed relationships":"http://localhost:9999/node/0/relationships/in/{-list|&|types}",
    "incoming relationships":"http://localhost:9999/node/0/relationships/in",
    "data":{},
    "traverse":"http://localhost:9999/node/0/traverse/{returnType}",
    "all typed relationships":"http://localhost:9999/node/0/relationships/all/{-list|&|types}",
    "outgoing typed relationships":"http://localhost:9999/node/0/relationships/out/{-list|&|types}",
    }
    Which gives us some info about what the Node 0 can do, how to get its relationships and properties and the syntax of how to construct queries for getting properties, creating relationships etc.

    Insert some data

    According to RESTful thinking, data creation is handled be POST, updates by PUT. Let's insert a node:
    peterneubauer$ curl -X POST -H Accept:application/json -v localhost:9999/node
    * About to connect() to localhost port 9999 (#0)
    *   Trying 127.0.0.1... connected
    * Connected to localhost (127.0.0.1) port 9999 (#0)
    > POST /node HTTP/1.1
    > User-Agent: curl/7.19.7 (i386-apple-darwin10.2.0) libcurl/7.19.7 zlib/1.2.3
    > Host: localhost:9999
    > Accept:application/json
    >
    {
    ...
    "data":{},
    ...
    }
    Resulting in a new node with the URL localhost:9999/node/1 (described by the "self" property in the JSON representation) and no properties set ("data":{}). The Neo4j REST API is really trying to be explicit about possible further destinations, making it self-describing even for new users, and of course abstracting away the server instance in the future. This makes dealing with multiple Neo4j servers easier in the future. We can see the URIs for traversing, listing properties and relationships. The PUT semantics on properties work like for nodes.
    We delete the node again with
    curl -X DELETE  -v localhost:9999/node/1

    and get 204 - No Content back. The Node is gone and will give a 404 - Not Found if we try to GET it again.

    The Matrix

    Now with properties encoded in JSON we can easily start to create our little Matrix example:



    In order to create relationships, we do a POST on the originating Node and post the relationship data along with the request (escaping the whitespaces and others special characters):
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Mr. Andersson"}' -v localhost:9999/node
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Morpheus"}' -v localhost:9999/node
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Trinity"}' -v localhost:9999/node
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Cypher"}' -v localhost:9999/node
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Agent Smith"}' -v localhost:9999/node
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"The Architect"}' -v localhost:9999/node

    Getting http://localhost:9999/node/1, http://localhost:9999/node/2, http://localhost:9999/node/3 as the new URIs back. Now, we can connect the persons (escaping ruining readability a bit ...):
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/1","type":"ROOT"}' -v http://localhost:9999/node/0/relationships
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/2","type":"KNOWS"}' -v http://localhost:9999/node/1/relationships
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/3","type":"KNOWS"}' -v http://localhost:9999/node/2/relationships
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/4","type":"KNOWS"}' -v http://localhost:9999/node/2/relationships
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/5","type":"KNOWS"}' -v http://localhost:9999/node/4/relationships
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/6","type":"CODED BY"}' -v http://localhost:9999/node/5/relationships
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/1","type":"LOVES"}' -v http://localhost:9999/node/3/relationships

    Now, pointing our browser at http://localhost:9999/node/3/relationships/all will list all relationships of Trinity:



    Our first traversal

    To start with, the Neo4j default Traverser framework (updated to be more powerful than the current) is supported in REST, and other implementations like Gremlin and Pipes to follow. The documentation on the traversals is in the making here. There are a number of different parameters:
    http://localhost:9999/node/3/traverse/node specifies a return type of "node", returning node references. There are other return types such as relationship, position and path returning other interesting info respective. The Traverser description is pluggable and has default values - a full description looks like
    {
    "order": "depth first",
    "uniqueness": "node path",
    "relationships": [
    { "type": "KNOWS", "direction": "out" },
    { "type": "LOVES" }
    ],
    "prune evaluator": {
    "language", "javascript",
    "body", "position.node().getProperty('date')>1234567;"
    },
    "return filter": {
    "language": "builtin",
    "name", "all"
    },
    "max depth": 2
    }

    To note here is the pluggable description of the "return filter" (what to include in the return) and "prune evaluator" (where to stop traversing). Right now only JavaScript is supported for writing these more complicated constructs up, but other languages are coming. Very cool. To finish, let's get all the nodes at depth 1 from Trinity via trivial traversal:
    curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"order":"breadth first"}' -v http://localhost:9999/node/3/traverse/node

    Which just returns all nodes of all relationships types at depth one (default) as a JSON Array of node descriptions as above, in this case http://localhost:9999/node/1 and http://localhost:9999/node/2.

    Summary

    Having the Neo4j REST API and with it the Neo4j REST Server coming along is great news for all that want to use a graph database over the network, especially PHP or .NET clients that have no good Java bindings. Already a first client wrapper for .NET by Magnus Mårtensson from Jayway is underway, and a first PHP client is on Al James' GIThub.
    This will even pave the way for higher-level sharding and distribution scenarios and can be used in many other ways. Stay tuned for a deeper explanation of the different traversal possibilities with Neo4j and REST in a next post!

    LinkedGeoData.org

    Save

    Online Access

    The following REST/Linked Data services are provided:

    Obtain information about points of interest in a circular area:

    http://linkedgeodata.org/triplify/near/%latitude%,%longitude%/%radius%

    An example obtaining points of interest in a 1000m radius around the center of Dresden is:


    Obtain information about points of interest in a circular area having a certain property:

    http://linkedgeodata.org/triplify/near/%latitude%,%longitude%/%radius%/%category%

    An example obtaining amenities in a 1000m radius around the center of Dresden is:


    Obtain information about points of interest in a circular area having a certain property value:

    http://linkedgeodata.org/triplify/near/%latitude%,%longitude%/%radius%/%property%=%value%

    An example obtaining pubs in a 1000m radius around the center of Dresden is:


    Obtain information about points of interest in a circular area belonging to a certain class:

    http://linkedgeodata.org/triplify/near/%latitude%,%longitude%/%radius%/class/%class%

    An example obtaining places of worship in a 1000m radius around the center of Dresden is:


    Obtain information about a particular point of interest (identified by its OSM id):

    http://linkedgeodata.org/triplify/node/%OSMid%

    An example obtaining information about the Cafe B'liebig in Dresden is:


    Obtain information about a particular way (identified by its OSM id):

    http://linkedgeodata.org/triplify/way/%OSMid%

    An example obtaining information about the Alte Mensa at TU Dresden is:


    Notes:

    • latitude and longitude are WGS84 coordinates (do not use scientific notation e.g. 6.0221418E-23)
    • radius is in metres
    • classes are defined in the LGD vocabulary