A major feature of the Romana Project is topology-aware IPAM, and an integral part of it is the ability to assign consecutive IP addresses in a block (and reuse freed up addresses, starting with the minimal).
Since IPv4 addresses are essentially 32-bit uint, the problem is basically that of maintaining a sequence of uints, while allowing reuse.
To that end, a data structure called IDRing was developed. I’ll describe it here. It is not yet factored out into a separate project, but as Romana is under Apache 2.0 license, it can still be reused.
- The IDRing structure is constructed with a NewIDRing() method, that provides the lower and upper bound (inclusive) of the interval from which to give out IDs. For example, specifying 5 and 10 will allow one to generate IDs 5,6,7,8,9,10 and then return errors because the interval is exhausted.
Optionally, a locker can be provided to ensure that all operations on the structure are synchronized. If nil is specified, then synchronization is the responsibility of the user.
- To get a new ID, call GetID() method. The new ID is guaranteed to be the smallest available.
- When an ID is no longer needed (in our use case — when an IP is deallocated), call ReclaimID().
- A useful method is Invert(): it returns an IDRing whose available IDs are the ones that are allocated in the current one, and whose allocated ones are the available ones in the current one. In other words, a reverse of an IDRing with min 1, max 10 and taken IDs from 4 to 8 inclusively is an IDRing with the following
available ranges: [1,3], [9,10].
- You can see examples of the usage in the test code and in actual IP allocation logic.
- Persisting it is as easy as using the locker correctly and just encoding the structure to/decoding it from JSON.
Here I will discuss metadata-driven data collection platform.
Here I will describe the Nanoput project, which comprises a large part of OpenDSP’s DMP (Data Management Platform). There are, of course, other pieces — the entire picture will be painted under the DMP tag.
Overall DMP stack is as follows:
- Nanoput proper: NGINX with Lua to handle . I’ll confess that at the moment this is a bit of a pet (as in not cattle). We are considering using OpenResty instead of rolling our own, which uses parts of OpenResty. But no matter, here I will present some features that can be achieved with this setup — and one instance is capable of handling all this.
- Redis for storing and manipulating user sets — ZSET is great
- MySQL for storing metadata — will be described in a separate post
- PHP/JS for a simple Web interface to define the said metadata
- Python for translating metadata into the configuration for NGINX
- AWS S3 for storing raw logs — pre-partitioned so that EMR can be used easily.
Conceptually, let’s consider the idea of an “event”. An impression, a conversion, a video tracking event, a site visit, etc, is an event — anything that fires a request to our Nanoput is. You may recognize a similarity with Snowplow — and that is because we are solving a similar problem.
- Exchanges or DMPs, as an exchange-initiated cookie-sync: see below.
- Regular user behavior — impressions, in case of video, video-tracking events; conversions
Now, let us also consider the idea of a “user segment”. If you think about it, a segment is just a set of users. Thus, we may as well consider a user that produced a certain event as belonging to some segment. These may be explicitly asked for, such as “users we want to retarget“, or “users who converted”, etc. But there is no reason why any event cannot be defined as a segment-defining one.
Segments, here, are a special case of data collections concept discussed in a different post.
Given that, we can now dive into Nanoput implementation
General data acquisition idea
Static data acquisition URLs
By “static”, here we mean common use cases that are just part of Nanoput (hence the “man” subdirectory you will notice in examples — stands for, just like the Unix man command, for “manual”). Here we have:
- Site events (essentially, those are an extension of retargeting concept).
- Standard event tracking — by which we mean, standard events that happen in Ad world.
Dynamic (metadata-driven) data acquisition URLs
On every “event”, consider script. We use awesome Redis’s Sorted Set functionality here inserting things twice. The key idea here, again, is a variation on dealing with data gravity concerns by just duplicating storage. We create two sorted sets for each key, the “score” being the first and last time we have seen the user. The reasoning for this is that:
- First-seen: we can write batch scripts to get rid of users we have last seen over X days ago (expiring targeting).
- Last-seen: helps us with conversion attribution (yes, this assumes naive last-click attribution or variants.
Duplication is not just for every user — it is for every set. The key here is the set (or segment) name, and the value is the set of users.
An added benefit of this is that new segments can be created by using various Redis set operations (union, intersection) easily.
Some useful shortcuts for a DMP
- Getting OS/browser info without necessarily using WURFL (though that can easily be fronted by NGINX too, actually).
Exchange cookie sync
In the display world, there is a need for cookie syncing between DSP and a third-party DMP or an exchange/SSP, and that can be either exchange, DMP or DSP-initiated, or both. Some exchanges may allow the redirect chain to proceed further, some may not. Nanoput provides this functionality for exchanges we deal with as well as a template for doing it for other partners — at the speed that NGINX provides. Here are the moving parts:
- On a partner (SSP/exchange or DMP)-initiated sync, a /man/xchin URL calls xchin.lua script that, depending on known partner policy, responds with a proper redirect (coded specifically to ensure maximum redirects). You may notice that this could be initiated by the DSP itself and you would not be wrong.
- DSP itself may initiate a cookie sync, and this is controlled by xchout.lua script.
Storing for further analysis
Raw logs, formatted as above, are uploaded to S3. Notice that they are stored twice, with different partitioning schema. This is one of the key ideas in Nanoput — storage is cheap; duplicating the storage this way and then using one or another partitioning schema depending on the use case:
- Partitioned by date — useful for most internally-provided reporting
- Partitioned by user — here, it’s important to see that it’s a multi-tenant system; in this context, “user” is a client/customer. This partitioning is useful to provide customers with ability to run their own custom queries. See also notes on data duplication and data gravity.
If you’ve been there, or even if you just looked at the website, you’d realize that the owner put quite an effort into it being a Viennese-style coffee house, with all the interior design decisions that go with it.
Now, a local coffee shop is often a place where people expect to post some local notices and ads (“lost dog”, “handyman available”, “local church choir concert”, etc). And here’s a conundrum. A simple cork bulletin board with a bunch of papers pinned to it just did not seem to fit the overall mood/interior/decor of the cafe:
Yet the cafe does want to serve local community and become an institution.
This being Silicon Valley, Val, the Kaffeehaus owner, had a vision — what about a virtual board, as a touch-screen.
The name was quickly chosen to be Wildboard — because it is, well, a bulletin board and in honor of the boar’s head that is prominently featured on the wall:
A multi-touch-based virtual bulletin board sounded interesting. Most touch-screen kiosks I’ve seen so far — in hotels and malls, for instance, or things like ImageSurge — only allow tap, not true multi-touch. (To be honest, multi-touch may or may not be useful — but see below and see also P.S. — but it is a very nice “pizzazz”).
And we — that is, myself and Vio — got to work. And in short order we had:
- Wildboard “board server” — a Python app running on the same computer as the UI. It is responsible for polling the web server (below) and serving information to the UI (source).
- Wildboard web server — a PHP app based on an existing web classified application(source). This allows users to submit ads (or they can do it via a mobile app, as below). It is also modified to automatically create QR codes based on user-provided information (map, contact, calendar, etc) and adds them to an ad.
- Wildboard mobile app — PhoneGap/Cordova based app for both Android and iPhone (source)
This app allows one to:
- Post an ad
- Scan an ad’s QR code
- And, finally, for the “Wow!” effect during the demo, one can drag an ad from the screen into the phone. Here it is, in action:
- Wildboard orchestrator — a Node.js app (source) designed to coordinate interactions between the mobile app and the board. It is the one that is determines which mobile app is near which board and orchestrates the fancy “drag” operation shown above.
- For more information, check out spec and the writeup.
Charismatic Val somehow managed to get a big touch screen from Elo Touch. Here’s how it fit in the decor:
A network of such bulletin boards, allowing hyper-local advertising, seems like a good idea. Monetization can be done in a number of ways:
- Charging for additional QR codes — e.g., map, contact, schedule.
- Custom ad design (including interactive and advanced multimedia features — sound, animation, video).
- A CPA (cost-per-acquisition) model, while tracking interaction via an app — per saved contact, per scheduled appointment, per phone call.
- Premium section.
But… alas… This is as far as we got.
P.S. One notable exception is a touch-screen showing suggestions in Whole Foods in Redwood City.
Every marketer, it seems, wants to participate in real-time bidding (RTB). But what is it that they really want?
They want an ability to price (price, not target!) a particular impression in real-time. Based on the secret-sauce business logic and data science. Fair enough.
But that secret sauce, part of their core competence, is just the tip of the iceberg — and the submerged part is all that is required to keep that tip above water. To wit:
- Designing, developing, testing and maintaining actual code for the UI for targeting, the bidder for bidding, reporting, and data management
- Scaling and deploying such code in some infrastructure (own data center,
clouds like AWS, GCE, Azure), etc.
- Integrating with all exchanges of interest, including the following steps:
- Code: passing functional tests (understanding the exchange’s requirements for parsing request and sending response)
- Infrastructure: ensuring the response is being sent to the exchange within the double-digit-millisecond limit
- Scaling: As above, but under real load (hundreds of thousands of queries per second)
- Business: Paperwork to ensure seat on the exchange, including credit agreements when necessary
- Operations: Ongoing monitoring of the operations, including technical (increased latency) and business (low fill level, high disapproval level) concerns (whether these concerns are triggered by clients, exchange partners or,
ideally, pro-actively addressed internally.
None of which is their core competence. We propose to address the underwater part. It’ll be exciting.
Enter OpenDSP. We got something cool coming up here. Stay tuned.
It’s been a fun ride and I guess it will continue.