memonic

Swiss Address Visualization with WebGL

Save

Swiss Address Visualization with WebGL

screenshot
29a.ch/sandbox/2011/addresscloud/

As some of you know I work for local.ch. I was looking for cool visualizations to do with our data for quite a while, missing the obvious - plotting all our 3.7 million geocoded addresses in 3D using WebGL! I'm actually quite impressed by the accuracy of the data. But go and have a look for your self.

Controls

WASD + Mouse (drag). Velocity is scaled with altitude.

Video

If you can't see the demo for some reason I uploaded a short video of the demo to youtube.

Techniques

The points are encoded in a Float32Array, then sorted and gziped using a python script. Sorting the data improves the compression ratio by over 200% so it's well worth the effort. This brings the original 100mb file down to 7mb.

The file is then loaded using XHR level 2, which supports binary files and progress events. The points are then rendered using WebGL as GL_POINTS and additive blending is used to give it a glow effect. In the future I might add HDR rendering and blooming.

There is no level of detail or culling performed so this will require a relatively powerful rig. Also note that for some reason Firefox Aurora (9) seems to be quicker than Chrome Dev (16) for some mysterious reason. I would expect all of the work to be done by OpenGL so I'm not sure about where this comes from. It could be chromes process isolation.

Sourcecode

You can find the source code on github if you want to get into some hacking. Note that the data belongs to local.ch and may not be used.

Comments (1)

Patrice Neff

Patrice Neff Oct 12, 2011

Whoah, bei mir im Chrome provoziert die Seite einen Reset der Grafik-Karte.

MapFish

Save

MapFish

MapFish is a flexible and complete framework for building rich web-mapping applications. It emphasizes high productivity, and high-quality development.

MapFish is based on the Pylons Python web framework. MapFish extends Pylons with geospatial-specific functionality. For example MapFish provides specific tools for creating web services that allows querying and editing geographic objects.

MapFish also provides a complete RIA-oriented JavaScript toolbox, a JavaScript testing environment, and tools for compressing JavaScript code. The JavaScript toolbox is composed of the ExtJS, OpenLayers , GeoExt JavaScript toolkits.

MapFish is compliant with the Open Geospatial Consortium standards. This is achieved through OpenLayers or GeoExt supporting several OGC norms, like WMS, WFS, WMC, KML, GML etc..

MapFish is open source, and distributed under the BSD license.

Current release

MapFish 2.2 is the current release. Check out the reference documentation.

Interoperability

The MapFish framework is built around an open HTTP-based protocol, allowing various interoperable implementations. In addition to the reference implementation provided by the Python/Pylons-based framework, two other implementations are currently available:

  • a Ruby/Rails plugin (GPLv3)
  • a PHP/Symfony plugin (BSD)

See the documentation for more information.

MapFish Print

The MapFish project hosts MapFish Print, a Java library to print maps. MapFish Print is independent from the MapFish framework, but works well with it.

MapFish Print is released under the GPLv3 license.

See the MapFish Print Documentation for more information.

Affiliation

_images/OSGeo_project.png

MapFish is a project of the Open Source Geospatial Foundation (OSGeo Foundation). OSGeo‘s mission is to support and build the highest-quality open source geospatial software.

Community

The MapFish project is governed by a Project Steering Committee, composed of people from various organizations.

Discussions regarding the project are public. They occur on the project mailing lists and on IRC (irc://irc.freenode.net/#mapfish).

The project welcome new contributions; bugs and new features can be reported and requested on the bug tracker. Visit also the wiki page describing how to contribute to the project.

More information related to the community and development is available in the Wiki/Trac.

News and Events

FOSS4G is the global conference focused on Free and Open Source Software for Geospatial, organized by OSGeo. In 2011, FOSS4G will be held in Denver. See http://2011.foss4g.org/.

Try Redis

Save

Redis is what is called a key-value store, often referred to as a NoSQL database. The essence of a key-value store is the ability to store some data, called a value, inside a key. This data can later be retrieved only if we know the exact key used to store it. We can use the command SET to store the value "fido" at key "server:name":


    SET server:name "fido"

Redis will store our data permanently, so we can later ask "What is the value stored at key server:name?" and Redis will reply with "fido":


    GET server:name => "fido"

Type NEXT to continue the tutorial.

PhpRedis

Save

PhpRedis

The phpredis extension provides an API for communicating with the Redis key-value store. It is released under the PHP License, version 3.01. This code has been developed and maintained by Owlient from November 2009 to March 2011.

You can send comments, patches, questions here on github or to n.favrefelix@gmail.com (@yowgi).

Installing/Configuring

phpize
./configure
make && make install

make install copies redis.so to an appropriate location, but you still need to enable the module in the PHP config file. To do so, either edit your php.ini or add a redis.ini file in /etc/php5/conf.d with the following contents: extension=redis.so.

You can generate a debian package for PHP5, accessible from Apache 2 by running ./mkdeb-apache2.sh or with dpkg-buildpackage or svn-buildpackage.

This extension exports a single class, Redis (and RedisException used in case of errors). Check out https://github.com/ukko/phpredis-phpdoc for a PHP stub that you can use in your IDE for code completion.

Install on OSX

If the install fails on OSX, type the following commands in your shell before trying again:

MACOSX_DEPLOYMENT_TARGET=10.6
CFLAGS="-arch i386 -arch x86_64 -g -Os -pipe -no-cpp-precomp"
CCFLAGS="-arch i386 -arch x86_64 -g -Os -pipe"
CXXFLAGS="-arch i386 -arch x86_64 -g -Os -pipe"
LDFLAGS="-arch i386 -arch x86_64 -bind_at_load"
export CFLAGS CXXFLAGS LDFLAGS CCFLAGS MACOSX_DEPLOYMENT_TARGET

See also: Install Redis & PHP Extension PHPRedis with Macports.

How does MySQL CASE work? - Stack Overflow

Save
How does MySQL CASE work?

CASE is more like a switch statement. It has two syntaxes you can use. The first lets you use any compare statements you want:

CASE     WHEN user_role = 'Manager' then 4    WHEN user_name = 'Tom' then 27    WHEN columnA <> columnB then 99    ELSE -1 --unknownEND

The second style is for when you are only examining one value, and is a little more succinct:

CASE user_role    WHEN 'Manager' then 4    WHEN 'Part Time' then 7    ELSE -1 --unknownEND

Comments (1)

Denis De Mesmaeker

Denis De Mesmaeker Oct 10, 2011

you can't do: CASE id when id > 4... but instead CASE when id > 4...

Designing a Secure REST (Web) API without OAuth

Save
Amazon Web Services has one of the largest and most used web APIs online right now, and they don’t support OAuth at all!

A server and a client know a public and private key; only the server and client know the private key, but everyone can know the public key… who cares what they know.

A client creates a unique HMAC (hash) representing it’s request to the server. It does this by combining the request data (arguments and values or XML/JSON or whatever it was planning on sending) and hashing the blob of request data along with the private key.

The client then sends that HASH to the server, along with all the arguments and values it was going to send anyway.

The server gets the request and re-generates it’s own unique HMAC (hash) based on the submitted values using the same methods the client used.

The server then compares the two HMACs, if they are equal, then the server trusts the client, and runs the request.

  • [CLIENT] Before making the REST API call, combine a bunch of unique data together (this is typically all the parameters and values you intend on sending, it is the “data” argument in the code snippets on AWS’s site)
  • [CLIENT] Hash (HMAC-SHA1 or SHA256 preferably) the blob of data data (from Step #1) with your private key assigned to you by the system.
  • [CLIENT] Send the server the following data:
  1. Some user-identifiable information like an “API Key”, client ID, user ID or something else it can use to identify who you are. This is the public API key, never the private API key. This is a public value that anyone (even evil masterminds can know and you don’t mind). It is just a way for the system to know WHO is sending the request, not if it should trust the sender or not (it will figure that out based on the HMAC).
  2. Send the HMAC (hash) you generated.
  3. Send all the data (parameters and values) you were planning on sending anyway. Probably unencrypted if they are harmless values, like “mode=start&number=4&order=desc” or other operating nonsense. If the values are private, you’ll need to encrypt them.
  • (OPTIONAL) The only way to protect against “replay attacks” on your API is to include a timestamp of time kind along with the request so the server can decide if this is an “old” request, and deny it. The timestamp must be included into the HMAC generation (effectively stamping a created-on time on the hash) in addition to being checked “within acceptable bounds” on the server.
  • [SERVER] Receive all the data from the client.
  • [SERVER] (see OPTIONAL) Compare the current server’s timestamp to the timestamp the client sent. Make sure the difference between the two timestamps it within an acceptable time limit (5-15mins maybe) to hinder replay attacks.
  1. NOTE: Be sure to compare the same timezones and watch out for issues that popup with daylight savings time change-overs.
  2. UPDATE: As correctly pointed out by a few folks, just use UTC time and forget about the DST issues.
  • [SERVER] Using the user-identifying data sent along with the request (e.g. API Key) look the user up in the DB and load their private key.
  • [SERVER] Re-combine the same data together that the client did in the same way the client did it. Then hash (generate HMAC) that data blob using the private key you looked up from the DB.
  1. (see OPTIONAL) If you are protecting against replay attacks, include the timestamp from the client in the HMAC re-calculation on the server. Since you already determined this timestamp was within acceptable bounds to be accepted, you have to re-apply it to the hash calculation to make sure it was the same timestamp sent from the client originally, and not a made-up timestamp from a man-in-the-middle attack.
  • [SERVER] Run that mess of data through the HMAC hash, exactly like you did on the client.
  • [SERVER] Compare the hash you just got on the server, with the hash the client sent you; if they match, then the client is considered legit, so process the command. Otherwise reject the command!
You also slowly realize and accept that at some point you will have to implement OAuth, but it will probably be OAuth 2.0 support and that isn’t quite ready yet.
What about the scenario where you are writing a public-facing API like Twitter, where you might have a mobile app deployed on thousands of phones and you have your public and private keys embedded in the app?
Additional Thoughts for APIs
What you can do though is to issue private keys on a per-application-basis, instead of on a per-user-account basis. That way if the private key is compromised, that version of the application can be banned from your API until new private keys are generated, put into an updated version of the app and re-released.

Update #1: There are some fantastic feedback and ideas on securing a web-API down in the comments, I would highly recommend reading them.

Some highlights are:

Update #2: I have since looked at “2-legged OAuth” and it is, as a few readers pointed out, almost exactly the process described above. The advantage being that if you write your API to this spec, there are plenty of OAuth client libraries available for implementors to use.

The only OAuth-specific things of note being:

  • OAuth spec is super-specific with how you need to encode your pararms, order them and then combine them all together when forming the HMAC (called the “method signature” in OAuth)
  • OAuth, when using HMAC-SHA1 encoding, requires that you send along a nonce. The server or “provider” must keep the nonce value along with the timestamp associated with the request that used that nonce on-record to verify that no other requests come in with the SAME nonce and timestamp (indicating a “replay” attempt). Naturally you can expire these values from your data store eventually, but it would probably be a good idea to keep them on-file for a while.
    • The nonce doesn’t need to be a secret. It is just a way to associate some unique token to a particular timestamp; the combination of the two are like a thumbprint saying “at 12:22pm a request with a nonce token of HdjS872djas83 was received”. And since the nonce and timestamp are included in the HMAC hash calculation, no nefarious middle-man can ever try and “replay” that previous message AND successfully hash his request to match yours without the server seeing the same timestamp + nonce combination come back in; at which point it would say “Hey! A request with this thumbprint showed up two hours ago, what are you trying to do?!”
  • Instead of passing all this as GET params, all these values get jammed into one giant “Authorization” HTTP header and coma-separated.

Support

Save

Posterous considered Amazon's EC2, but chose Rackspace instead. Here's why.

Posterous, the popular blogging service, is the dead simple way to post everything. You make posts by sending email.

"We wouldn't be here if it weren't for The Rackspace Cloud. We can spin up new servers in seconds, and the resources are so affordable, we're able to offer the service free to consumers - forever."
- Sachin Agarwal, Posterous, Founder

Head to Head: Why Rackspace Cloud Servers™ for Linux works for Posterous
and Managed Cloud customers.

Support

Amazon EC2

To receive 24x7x365 support for Amazon EC2, you pay the greater of $400 per month or 10% (scaling down) of your EC2 costs. That means your support cost goes up with your EC2 usage and isn’t directly related to the amount of support you use.
















Cloud Servers

Cloud Servers™ is backed by the legendary Fanatical Support you can only get from Rackspace.

With our core service level (free with Cloud Servers), customers get:

  • 24x7x365 Chat/Phone/Ticket Support
  • Access to The Rackspace Cloud Control Panel
  • 100% Network Uptime Guarantee
  • 100% HVAC/Power Uptime Guarantee
  • Access to our forums and online resources, and much more

The Cloud Servers™ with a managed service level option extends our world-class managed services from our Managed Hosting offering to the Rackspace Cloud. This offer provides an additional level of support on Cloud Servers, which includes monitoring, operating system and application infrastructure layer support, and technical guidance.

Third Party Software Support

Amazon EC2

Amazon does not provide support for third party software even if customers purchase the highest level of support.









Cloud Servers

The Cloud Servers™ with a managed service level option gives our customers support for a number of third party software including:

  • Linux operating system distributions(e.g. Ubuntu, Red Hat, Fedora, CentOS, etc.)
  • Microsoft Windows operating system images
  • Microsoft SQL Server
  • Apache
  • MySQL
  • .Net/IIS (Windows)

Admin Level Troubleshooting

Amazon EC2

Amazon Support will not log in to a customer’s EC2 server to help its customers fix a problem.


Cloud Servers

If requested by a Cloud Servers™ with a managed service level customer, Rackspace Cloud support techs will log in to a customer’s Cloud Server to help fix a problem.

Persistence

Amazon EC2

Amazon EC2 instances are transient or ephemeral—if there is a host failure that causes your instance to terminate, all local data on that instance will be lost. Data persistence (not server persistence) can be added with Amazon EBS; however, EBS adds additional cost and complexity.



Cloud Servers

One of the most significant differences between Cloud Servers™ and EC2 is the persistence of each virtual server. Cloud Servers™ has access to local, RAID10 disk storage, much like you’d expect in a physical server. This is important because it means your server has inherent protection against drive failures. If for some reason the host does fail or becomes degraded, we will restart and/or migrate your Cloud Server for you. A failure doesn’t mean that your Cloud Server goes away.

Server Sizes

Amazon EC2

Amazon EC2 Standard Instances start at 1.7 GB, so if your workload requires fewer resources, you are stuck paying for much more than you need. Amazon recently introduced Micro Instances (starting at 613 MB) for customers needing CPU burst capabilities.

Cloud Servers

We provide a wide variety of Cloud Server sizes, starting at 256 MB and going up to 16 GB. Cloud Servers™ can be resized to scale without any reinstallation.



Hybrid Hosting

Amazon EC2

Amazon only offers part of the answer, with only cloud solutions. Amazon VPC is a beta service offering to connect a company’s infrastructure to Amazon’s cloud; however, Amazon does not offer hosting on dedicated/managed servers.



Cloud Servers

Depending on your needs, you can get the best of both worlds with a combination of cloud and dedicated servers with our RackConnect™ solution. The best configuration for your business may span more than one platform. By mixing-and-matching compute platforms, Rackspace can help create the optimal compute solution for your business. Why settle when you can have both under one roof from Rackspace?

CPU Scheduling

Amazon EC2

Amazon EC2 instances have a capped CPU. If additional CPU capacity is required, you need to launch another instance. Amazon recently introduced Micro Instances which can be added (at an additional cost) for extra CPU resources.

Cloud Servers

Cloud Servers™ has guaranteed minimum CPU power (relative to the size of the Cloud Server), with free bursting when extra capacity is available on the host.



Compute Power

Amazon EC2

While the pricing of EC2 instances appears to be lower, if it takes more than twice the time to complete a task, the total price to complete a task increases proportionally.

Cloud Servers

A recent study conducted by an independent third party demonstrated that US-based Cloud Servers™ is, on average, more than two times more powerful than comparable Amazon EC2 servers.

Disk I/O

Amazon EC2

Amazon has a block storage solution that can show better performance than their built-in ephemeral storage under the right conditions; however, this solution results in additional costs, as both the amount of data stored and transferred are billed for.

Cloud Servers

A recent study conducted by an independent third party demonstrated that on average US-based Cloud Servers™ have a higher disk throughput than comparable Amazon EC2 servers.


IP Addresses

Amazon EC2

With EC2, the IP configuration is more complex. Each instance gets a non-persistent private IP address NATed to a public IP address. When instances terminate and new ones are launched, a new private IP address is assigned which means you need to plan for changing private IPs (although Elastic IPs can be remapped). In addition, only one NATed IP address is available which, for example, does not lend itself to hosting multiple sites via SSL.

Cloud Servers

Each Cloud Server comes with the simplicity of a dedicated and persistent public IP address (no NAT) with a second, private IP address for free. There is also low latency bandwidth between your Cloud Servers™. Additional public IPs are available upon request and shared IPs can be provided for high availability.




Open Philosophy

Amazon EC2

Amazon has not embraced an open-source approach for cloud interoperability.













Cloud Servers

The Rackspace Cloud approach is one that is standards-based and open. As an active member of organizations such as the Distributed Management Task Force (DMTF), The Rackspace Cloud collaborates to develop standards and promote interoperability. To help ensure that the community shaped the Cloud Servers™ API, The Rackspace Cloud solicited feedback and conducted intensive testing with its partners and cloud developers.

In 2010, Rackspace became a founding member of OpenStack, an open-source cloud platform designed to foster the emergence of technology standards and cloud interoperability.

EBS

Save
We’re moving. Goodbye Rackspace.
If you're interested in what we work on, please apply - we're hiring: http://mixpanel.com/jobs/

At Mixpanel, where our hardware is and the platform we use to help us scale has become increasingly important. Unfortunately (or fortunately) our data processing doesn’t always scale linearly. When we get a brand new customer sometimes we have to scale by a step function; this has been a problem in the past but we’ve gotten better at this.

So what’s the short of it? We’re unhappy with the Rackspace Cloud and love what we’re seeing at Amazon.

Over the history we’ve used quite a few “cloud” offerings. First was Slicehost back when everything was on a single 256MB instance (yeah, that didn’t scale). Second was Linode because it was cheaper (money mattered to me at that point). Lastly, we moved over to the Rackspace Cloud because they cut a deal with YCombinator (one of the many benefits of being part of YC). Even with all the lock in we have with Rackspace (we have 50+ boxes and hiring if you want to help us move them!), it’s really not about the money but about the features and the product offering, here’s why we’re moving:

EBS

IO is a huge scaling problem that we have to think about very carefully. We’ve since deprecated Cassandra from our stack but Rackspace is a terrible provider if you’re using Cassandra in general. Your commit log and data directory should be on two different volumes–Rackspace does not make this easy or affordable. EBS is a godsend.

What happens when you need more disk space? You’re screwed -> resize your box and go down. Need more than 620G of space? You can’t do it.

EBS lets you mount volumes on to any node. This is awesome if you ever need to move your data to a bad node instead of having to scp it over.

Edit: Nobody is saying you get better IO performance on Amazon simply that EBS solves different IO challenges that Rackspace does not. IO is basically terrible everywhere on the cloud.

Instances

We’re super excited about the variety of instances that Amazon offers. The biggest money savers for us we foresee are going to be Amazon’s standard XL as well as the high CPU ones. Rackspace offers a more granular variety which is a benefit if you need to be thrifty but it bottlenecks fast as you begin to scale and realize what kind of hardware you need.

Uptime

Rackspace Cloud has had pretty atrocious uptime over the year there has been two major outages where half the internet broke. Everyone has their problems but the main issue is we see really bad node degradation all the time. We’ve had months where a node in our system went down every single week. Fortunately, we’ve always built in the proper redundancy to handle this. We know this will happen Amazon too from time to time but we feel more confident about Amazon’s ability to manage this since they also rely on AWS.

Control Panel

Rackspace’s control panel is the biggest pain in the ass thing to use. Their interface is clunky, bloated, and slow. In my experience, I can’t count how many times I’ve seen their Java exceptions while frantically trying to provision a new node to help scale Mixpanel.

Amazon has awesome and very well vetted command line tools that blow Rackspace out the water. I can’t wait to write a script and start up a node. I believe Rackspace has an SDK / command line tools now though–very early and beta however.

Quota limits

Probably the most frustrating thing on Rackspace is their insane requirement to post a ticket to get a higher memory quota. We’ve had fires where we needed to add extra capacity only to get an error when creating a new node that we can’t. Once you post a ticket, you have to wait for their people to answer your ticket in a 24-hour period. Now we just ask Rackspace for +100G increments way before we ever need it. I know Amazon doesn’t impose these limits to the same (annoying) extent.

Globalization

Amazon has a CDN and servers distributed globally. This is important to Mixpanel as websites all over the world are sending us data. There’s nothing like this on Rackspace. We have lots of Asian customers and speed matters.

Backups

Rackspace has a limit on their automatic backups: 2G. Our databases aren’t weak 2G memory bound machines–nobody’s is at scale. S3 is a store for everything and EBS is just useful for this kind of thing. Cloudfront on Rackspace is still in its infancy.

Backups on Amazon will be so much cleaner and straight forward.

Pricing

We’ve done a very methodical pricing comparison for our own hardware and have determined that the pricing is actually about the same across both services. We don’t know how well the hardware will scale on Amazon so we over-estimated to make up for crazy issues. Amazon came to about 5-10% cheaper but take that with a grain of salt. It’s probably closer to equal.

Here’s one huge thing though, Amazon in the long-run for our business will be drastically cheaper with the concept of Reserved instances and bidded instances. That’s extremely sexy to us.

Also? Amazon constantly reduces its prices. I’ve never seen Rackspace do that.

What’s the main reason for us?

Amazon just iterates on their product faster than anyone else and has the best one. We expect people to use us because of that in the long-run and we’ve taken note. Rackspace is extremely slow and as the person in charge of infrastructure and scalability we’re going to use the platform that knows how to keep guys like me happy by running fast and anticipating my needs.

Amazon’s products:

Rackspace cloud’s products:

If you have an opinion, express it. Tell us what you hate about Amazon and problems you’ve seen. We haven’t moved yet.

If you're interested in what we work on, please apply - we're hiring: http://mixpanel.com/jobs/

Vasile Cotovanu - Google+ - My focus during http://makeopendata.ch/ days was (what else…

Save
My focus during http://makeopendata.ch/ days was (what else ?) geo technologies hacking; this time I played with OpenLayers library and especially the recently released api.geo.admin.ch API.

In one of the projects it was a need for an KML/track editor, where an user can easily register(digitize) a track(polyline) on top of Swisstopo material and save it "somewhere" that he/she see later on. That was the perfect challenge for me to get started with the JS GeoAdmin library and explore its capabilities. For storing the data I chose Fusion Tables as it was very handy and didn't require any DB to setup/etc.

The hack is available at:
http://www.vasile.ch/hacks/geoadmin-tracks-editor/
The source code + HOWTOs:
https://github.com/vasile/GeoAdmin-ManageTracks

Scrapy | An open source web scraping framework for Python

Save
Scrapy

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

(1 - 10 of 280)