This post is a compendium of Discovering GeoDB post series, which have been adapted to facilitate reading to newcomers.
Part I. The power of place
“There are three things that matter in property: location, location, location”
Attributed to Harold Samuel 
The tricolon “location, location, location” is a cliché widely used by property experts, having been used in print as early as 1926 . It was popularized thanks to Harold Samuel , who acquired a small property concern called Land Securities Investment Trust at the end of the Second World War and thanks to his ability to identify great business opportunities in the real estate sector, he was able to grow it into one of the largest companies on the London stock exchange [2, 3].
You may not know it, but several companies are using your private data right now. And there is something they’re particularly interested about, your location. To know their environment, study their competitors, discover opportunities and trends, optimize their supply chain and for many other things, companies are analyzing large volumes of data on a daily basis, and in many cases, location is absolutely crucial to contextualize the data and carry out useful analysis.
Probably the above does not surprise you, or maybe it does, but you’re likely to be unaware of the fact that several studies have proven that location is the private data that most concerns us [4, 5, 6]. And how much are companies paying you for using your most delicate private information? We don’t want to disappoint you, we know that there are several types of rewards, but we just want to ask you this question so you can think about it and about its implications.
In this first part, we’ll review several papers that have revealed the psychological value that we associate to our private location data. Later, we’ll review other studies that have shown the great value that this information has for some companies. It’s likely that after reading it, your answer to the previous question changes significantly.
But, what do you know about this?
Before talking about private location, we want to write some lines about our background so that you don’t end up thinking that we don’t have a dog in this fight.
Our team is made up of members who have been working in this sector for years, with experience in platforms that are being used worldwide. We put user’s privacy first and this has allowed us to obtain very valuable information regarding users’ concerns about their privacy.
We’ve heard multiple voices in multiple languages, we’ve reflected, we’ve investigated and we’ve detected the need to develop a new paradigm to transform the current situation.
Our experience has made us identify a solution that meets a need, and this process has made us jump into this exciting project.
Demand control over your private data
“Don’t sell your soul to buy peanuts for the monkeys.”
Dorothy Salisbury Davis 
Surely you’ve seen some commercials of Mastercard’s Priceless campaign. It’s what happens when an advertising campaign lasts for more than two decades. And would you say that your privacy is priceless? So, why do you sell it so cheaply?
You may think you’re not selling your privacy, but we encourage you to think about it. Do you have any loyalty card? Do you use any social network? Do you use a smartphone? Do you use any internet browser? Do you use the Internet? We regret to say that whether you like it or not, while you read this line you are selling a bit of your privacy, a bit of your life.
If someone is questioned about how much money they want for their private data, they’ll probably respond with a high sum of money or they could even be offended by our offer. Is it enough to camouflage it under a loyalty program that rewards their private data with merchandising brand products to end up with a happy private data provider, with a happy customer? We did not invent it, it’s a fact [5, 8].
Companies know about this bias in our behavior and exploit it in their favor. They need our private data and they know that we’re reluctant to give it to them, so they adapt their speech so you think they give you something for free that’s very valuable. You know what they normally say, if you’re not paying for it, you’re probably the product.
Having said the above, have you ever thought about how companies use your private data? Usually they do not plot something murky, rather the opposite. It’s difficult to make a precise classification, but we could say that our data is useful for things like i) offer better services, ii) optimize supply chains, iii) increase the clientele or iv) discover new opportunities.
Things like these allow them to:
- Offer the best mobile telephone coverage.
- Minimize costs to give low prices.
- Build supermarkets and gas stations in busy areas with easy access.
- Optimize navigation routes and estimate their duration.
- Give the best roadside assistance coverage.
- Offer fast and economic parcel services.
- And many other things [9, 10].
But let’s not think only about the private sector, let’s also think about everything that is possible in the public sector. Using these tools in public administration allows them to:
- Build hospitals and schools in the best locations.
- Create appropriate evacuation routes.
- Find the best sites for emergency services.
- Deploy the necessary infrastructures.
For your peace of mind, you should also know that nobody is interested in knowing your individualized data, in fact, the data of an isolated individual is not useful for this type of analysis. It’s the aggregation of data from multiple individuals that allows to respond to situations like the previous ones.
The crux of the issue is not how our private data is used, but how it’s collected. It’s inadmissible that companies can capture and exploit our private information without our consent. And it’s not ethical that, making use of psychological tricks, they can generate great economic benefits at the expense of paying the raw material with peanuts.
The collection and use of private data based on deception can not continue to be the norm, This approach generates suspicion and rejection from users and what is more important, opens privacy gaps .
It’s necessary to transmit a clear message to users so that they understand that providing their private data using the right channels is possible:
- Give them the power to decide how to use them.
- Avoid privacy risks.
- Obtain a fair economic reward.
- Enjoy optimized services.
We must demand control over what we sell, under what conditions and at what price.
The money that we demand in exchange for our private location data
For how much money would you sell your private location data every five minutes during the next month for? It’s a delicate question, isn’t it?
In 2013, the Financial Times published an interactive calculator  that allows to determine a price for our personal data based on pricing benchmarks supplied by a data brokers .
Playing with the calculator we obtain values ranging from half a dollar to about two dollars. For simplicity, let’s say that the average cost according to that calculator is around $1.
Buyers are not interested in individual users’ data, but large datasets with the data of millions of users. Based on the ratio, 1 person = 1 dollar, it is easy to estimate the cost of the data of a million of users.
However, the previous cost is computed based on the price given by Financial Times but, would you sell your personal data for $1?
The problem is that companies ask and respond at the same time, asking the question and telling you the answer. Setting that price is a way to self-justify the use of indirect formulas to capture user data. “How can I convince a user to give me their private data in exchange for a dollar?”.
It’s in our own interest to answer the question for ourselves, and researchers have already tried to give an honest answer to it.
In 2010, Ancient and Frog Design carried out an study to quantify the value of personal data that individuals would give up in exchange for a IT service and the results are summarized below .
The study shows two interesting conclusions:
- We want much more than a dollar in exchange for our personal data.
- We value certain kinds of information more highly than others.
Removing from this study the data that can only be revealed once, i.e., social security number, government id, credit card information, social profile, contact information and demographic information, we find that the most valued private data generated on a daily basis are:
- Digital communication history. $59
- Web search history. $57
- Physical location history. $55
- Web browsing history. $52
This difference in the appreciation of the value of the information has been corroborated in subsequent studies. Specifically, Staiano, J. et al. in their 2014 paper ‘Money Walks: A Human-Centric Study on the Economics of Personal Mobile Data’  found evidence that “location is the most valued category of personally identifiable information … and that bulk information is valued much higher than individual information”. It should not surprise us that, with the proliferation of smartphones, which continuously accompany us wherever we go, the location has become the most valued category of private information.
But is that the price of our private data? No, that is the price for which some users were willing to disclose their private data under certain conditions.
The value for which we’re willing to disclose some of our private location information was studied in 2005 by Danezis, G. et al.  and in 2006 by Cvrcek, D. et al .
In  was carried out a study in which, by using deception, the authors asked several people if they would be willing to provide their private location data in exchange for money. The study allowed them to measure different factors such as how many users were interested, what amount of money they demanded or how the expected use of the data influences the price.
This study showed some interesting results, but it was done in a relatively small scale at Cambridge University so, in  a new study was carried out using “a sample of over 1200 people from five EU countries, and used tools from experimental psychology and economics to extract from them the value they attach to their location data”. The size of the sample allowed them to “compare [the] value across national groups, gender and technical awareness, but also the perceived difference between academic use and commercial exploitation.”.
We highlight below some interesting results found in this study:
- Women are possibly more sensitive to what the collected data may be used for.
- The participants did not perceive their unusual movements as more sensitive than their everyday behaviour.
- The participants where more sensitive to the purpose of the data collection, than the duration and quantity of data collected.
- There are huge differences among countries in the sensitivity to the time extension.
- Basic results confirm results of the Cambridge study in the overall value of bids — e.g. medians of bids are 20 GBP and 43 EUR (i.e. about 28 GBP at the August 2006 exchange rates) for non-commercial use of data, respectively.
Well, it seems that we expect to receive much more than $1 for our personal data. Do not you think? And we should consider that these results are from 2006, since that year prices have gone up, the awareness of users regarding mobile technology has increased and location has become the most valued category of private information. We’re convinced that currently the price will be higher, but we do not want to suggest a price, you’ll decide yours.
It seems that we’ve raised a bit the amount of money that user should receive in compensation for providing their private location data, but is user’s private location so valuable?
Why your private location is valuable
The big data market was worth $125.000.000.000 in 2015 , which provides a sense of just how much financial capital enterprises are pouring in to data operations. But big data is not just about collecting and processing huge volumes of data. If the data you are storing and analyzing is full of inconsistencies, inaccuracies or other issues, the analytic results you obtain will be misleading .
It’s estimated that 80% of business data contains a location component so it’s critical to understand how it affects businesses. Their analysis can provide insights that supports and improves decision-making procedures in a lot of business aspects. Analyzing data by location allows businesses to ask and accurately answer questions such as “where are my customers?” or “how far are my customers from my location?” as well as “how well does my supply chain service those customers?”. 
Thanks to the ubiquity of smartphones, it’s possible to bring order to the data to be analyzed by adding location information to it; if you’re able to contextualize your data in this way, you can “reveal relationships between data sets that might not have otherwise been obvious or easy to ascertain and, through location analytics, arrive to the kind of insights that get reflected in the bottom line”. 
The last part of the previous paragraph is taken from an article published in 2017 by Forbes Insight in which they conducted several interviews with different executives who are using localization in their big data analysis. In the article, the executives talk openly about how this information is, in many cases, a critical element to perform an analysis that does not provide misleading results.
We would like to highlight some of the parts of the interview with Nigel Lester, managing director for Pitney Bowes in the Australia-New Zealand region, since we believe that they masterfully defined the importance of localization for big data analysis.
“The real strength of location data is that it becomes a common link between seemingly disconnected silos of business data … Data that doesn’t seem to have any obvious relationship can be contextualized by location. It could be your customer locations versus your competitor’s locations — data sets with no obvious link, but if you start to geo-enrich them, you may find that relationships begin to emerge and you’ll be able to build out a more holistic and valuable view of your customers.” 
We finish the previous section with the question “is user’s private location so valuable?”. We let you answer this question for yourselves.
In the next entry we’ll deepen on how it’s possible to find the balance between the economic interests of the users and the economic interests of the companies.
Part 2. Game theory
“An equilibrium is not always an optimum; it might not even be good. This may be the most important discovery of game theory.”
Ivar Ekeland, 2006 
In our previous part, we discussed about the value of private locations analyzing i) the value we give to that information and ii) the usefulness and effectiveness of this information to perform big data analysis in the business if we’re a company, but we consciously avoided setting a price, allowing the reader to draw their own conclusions. However we set two premises on top of which we build our proposal:
- Users consider their location as their most valuable private data. In exchange for this information they demand much more money than they currently receive.
- Companies are aware of the importance of localization for analysis. They know that this information allows them to connect data silos to extract knowledge from them.
We also reviewed in the previous part how companies are already obtaining our private location data, using psychological tricks with which they obtain our private information in exchange for very little money. This approach leads to an equilibrium between supply and demand, which is far from being optimal or good:
- Users are selling their private data without being aware of it, obtaining derisory benefits in return.
- Companies are distorting reality, making users believe that demand is low and therefore the price can not be higher.
In this part we review, from the perspective of game theory, and specifically from the study of the economic model of supply and demand , how in a free and competitive market the equilibrium between supply and demand sets the price for the goods that are exchanged in it.
Applying this to the commercialization of private location data, does it mean that the user can receive as much money as he wants for his data? No, it means that he’ll be informed of the value of his information so that he decides if he’s willing to provide it for a given price with no hidden catches.
The big cost of big data
So far we have only talked about benefits, and it’s obvious that to calculate them we should not only look at the profits, but also at the costs.
From a strictly economic perspective, the cost for users is negligible, understanding that they already have the necessary devices and that the capture and transmission of the information will be done without affecting the normal functioning of their devices.
But what about buyers? What other costs do they have? Mainly the costs of the big data infrastructure.
A key enabler for big data is the low-cost scalability. For example, a PetaByte, PB, Hadoop  cluster will require between 125 and 250 nodes which costs around $1.000.000 . So, is it possible to store 1.048.576 GigaBytes, GB, for $1.000.000. Or what is the same, can we store 1 GB per $0,954? It’s not that easy.
In 2012, Amazon carried out a study  on the costs associated with data warehouses, finding expenses up to $25.000 per TeraByte, TB, annually, or $1.000.000 by 40 TB for a year ($976,56 per GB).
What is behind these costs? Setup and maintenance. While storage is more affordable every year, engineering is what lies at the heart of the issue, having to solve challenges such as i) scrubbing information, ii) maintaining security, iii) establishing compatibility with business intelligence and analytics tools and iv) ongoing data movement .
However, a high operational cost is not problematic if the Return On Investment, ROI, is adequate, and in this field, it is. The big data market was worth $125.000.000.000 in 2015, which provides a sense of just how much financial capital enterprises are pouring in to data operations.
The reader could reason, ‘the costs are very high, but the ROI is also high, so it’s worth it’. But this admits another interpretation, ‘if the implementation of the solution is inadequate, you can lose a lot of money’. If you base your big data and analytics solutions on low-quality data, you will see few ROI. Some analysis concluded that a company was losing about $81.000 per month by failing to leverage data analytics effectively . Therefore, it is not enough to use an adequate infrastructure, but to fill it with quality data.
It’s true that companies make large profits by exploiting our private data, but we must understand that their operating costs are high. A company that invests in big data does so based on forecasts, the higher the costs and the lower the estimated ROI, the greater the risk it must face and the lower the probability that it’ll invest in this area.
Supply and demand
So, what is the best we can do? Tell the truth. It’s necessary to speak frankly with users and explain them that their private information is valuable and the reason for it. No more tricks. Companies must decide how much they’re willing to pay for this information and each user will decide whether to accept the offer or not, it’s that simple.
Ultimately, the difference between how much companies are willing to pay and how much users are willing to accept will result in an economic equilibrium for price, something that studies the economic model of supply and demand .
The model is usually represented using the Alfred Marshall’s supply and demand graph  in which demand and supply curves are represented, and the point at which they intersect is the economic equilibrium point that determines the price.
Understand how a free and competitive market works using this chart is extremely easy. Let’s see it applying the four basic changes in a hypothetical private localization data marketplace, that is, i) increase and ii) decrease in demand and iii) increase and decrease in supply.
Unlike other markets, ours works by accumulation, that is, each new good is added to the total. The reader might think that this increase in quantity goes hand in hand with a decrease in the price, but that is the result of a very simplistic interpretation. Historical values will be increasingly cheaper, but the most recent information, which is of greater value for big data analysis since it allows us to understand the current trends, is daily information and its economic value can be measured using this economic model.
We’ll assume that the market is described by the previous graph and the price of a Location, L, in Monetary Units, MU, is 1L = 1MU, and that this ratio is fixed.
If GeoDB’s locations data pool becomes more interesting for buyers, there will be an (i) increase in demand, which will cause an appreciation of the MU.
In the opposite case, that is, that (ii) demand decreases, that will cause a depreciation of the MU.
Regarding users, an (iii) increase in the number of users willing to sell their private locations will cause an increase in the supply and consequently a depreciation of the MU.
In the opposite case, that is, that (iv) supply decreases, that will cause an appreciation of the MU.
It’s simple, right? And what will be the price of the MU? We don’t know, we only know that it would be in this area.
However, it’s a huge improvement over the current situation, in which companies try to convince us that there is hardly any demand and consequently the graph is somewhat similar to the following:
Part 3. Blockchain 101
“Trust starts with truth and ends with truth”
Santosh Kalwar, 2010 
So far, we’ve reviewed the great value of private location data and reasoned how in a free and competitive market, the price will be dictated by both sellers and buyers. Now, it’s the turn to talk about technology, or rather, it’s time to talk about technological paradigms. It’s time to talk about blockchain technology.
Nowadays everyone talks about blockchain technology. They talk about its virtues and the opportunities that it offers; and they argue over whether it’s the next technological revolution or the next economic bubble.
The main reason that lies behind all the fuss is that blockchain is an enabling technology, i.e., a technology that allows some things that were impossible until it’s conception .
This kind of technologies imply a paradigm shift, which is why they’re so difficult to be understood and/or to be accepted. In addition to that, in this case there is a strong economic component, so there is a huge amount of information about it, which in many cases is inaccurate or completely wrong.
For all the above, prior to detailing any proposal that uses this technology it’s convenient to spend a little time to clarify the pillars on which it’s built.
Just enough to reassure the reader who fears to find a handful of complex concepts we want to clarify that blockchain technology is absurdly simple. If until now it seemed to you a complex and convoluted technology it’s because nobody has explained it to you properly.
What is a blockchain?
A blockchain  is a collection of information structured in such a way that guarantees certain properties such as the immutability of the data (this means that the historical records cannot be modified) or its authenticity (and this, that the creator of the data is the one who claims to be).
The distinctive component of a blockchain is the block, which by making an extreme simplification, can be divided into three subcomponents:
- A content.
- A reference to the previous block.
- A resume value obtained from the two previous subcomponents.
A minimum change in a single value of the content of a given block will cause the change of the resume of the modified block and of the subsequent ones, so it isn’t possible to modify a historical block in isolation.
Although the distinctive component of blockchain is the block, the distinctive component of blockchain technology is the network in which the blocks are generated and added to the chain.
Multiple nodes can participate in a blockchain network, each with its own copy of the blockchain. Using algorithmic negotiation techniques based on consensus, all the nodes decide in a decentralized way how to generate new blocks and how to add them to the distributed blockchain, which is considered valid by all of them.
This ability to generate immutable information in a decentralized way is what makes the blockchain technology an enabling technology with which it’s possible to carry out inconceivable actions until its emergence.
Well, what do you think if we tell you that the blockchain and the blockchain technology is just that, wouldn’t you think it’s borderline simple?
You may consider that the above explanation is incomplete and you may be missing the use of certain terminology commonly associated with blockchain. Well, let us tell you that the previous explanation is not incomplete, we might even consider it too extensive.
This terminology that you miss and that usually appears in any blockchain talk (transactions, consensus protocol, tokens, smart contracts and so on ), is what generates confusion and makes it look more complicated that it actually is. Believe us, those terms are only nuances in the definition of this technology.
What use has a blockchain?
As it has been remarked previously, blockchain technology is an enabling technology that allows unthinkable things so far.
A common error when a technology of this type appears is to try to use it to replace systems in production, forcing new and convoluted mechanisms that, in many cases, are far from improving existing ones.
It’s true that blockchain technology allows us to improve certain processes, but we can be much more ambitious.
The main value of this technology lies in the fact that it allows the decentralization of trust. This allows us to design tools in scenarios in which entities that do not know each other can trust each other with guarantees.
At the dawn of the internet, the vast majority of the applications that appeared only sought to replace existing processes (digital press, personal sites, landing pages for brands, …). Time has shown us that even the most visionary people were unable to conceive the potential of this technology.
We’re at the dawn of the blockchain, and we’ve in front of us a huge number of scenarios in which, the decentralization of trust, will allow us to create new and unique solutions.
Why use blockchain technology in GeoDB?
In the previous parts we’ve talked about the great utility of location for big data analysis, but also about the huge cost of these analytical techniques for companies and about how harmful could be for a business to obtain conclusions from the results of a big data analysis in which wrong data has been used.
The use of blockchain technology allow us to capture location information immutably, allowing us to guarantee, among other things:
- That the locations were captured by whoever claims to have captured them. The only thing you can say is where you’ve been.
- That the locations were captured at a given moment. We capture stories. A person, in a place, at a given time.
- That history isn’t mutable. It’s not possible to enrich the results to require a higher compensation or remove certain information so that it cannot be used by our competitors. There is no place for a minitrue .
- That the results are verifiable. Everyone can check that for each data query asked for, what’s returned is the information that must be returned. There is no noise or hidden information.
You might ask us, are you proposing a proof of location system? Not for the moment. We’re aware that users can use different mechanisms to distort their real position, and for that reason we’ll make available and SDK optimized to minimize, not eliminate, those cases. Our locations therefore have no validity as proof for a third party, they only have value because they’re valid for a big data analysis. Paradoxically, as users have no incentives to cheat, the vast majority of data will be real and therefore of high quality for big data analysis.
But the advantages of using blockchain technology aren’t only in the processing of information, but also in the access to it. A blockchain infrastructure give us the necessary tools to transmit the economic value in the ecosystem, adequately rewarding all the stakeholders of the system and fostering the development of an ecosystem based on trust and economic equity.
Part 4. Modular blockchain architectures
“Everything should be made as simple as possible, but no simpler”
Attributed to Albert Einstein, 1933 
When we created our blog to share our idea with the world, we asked ourselves, what is the best way to present GeoDB to a person who has absolutely no idea about the project? Do we write an extensive publication to explain absolutely everything that nobody is going to read more than the first couple of paragraphs of it? Do we write a brief summary with an amalgam of ideas in which we explain everything but in which nothing is clear? Do we make a logical division of our proposal and explain each concept independently?
It’s said that sometimes, to find a satisfactory solution to a dilemma, it’s best to reformulate the main question. In our case, what do we want? We want to explain our proposal to make it as clear as possible so it’s not necessary to indicate what our decision was. By dividing a proposal into its different logical parts, we can explain each of them in a simple way, being able to delve into all the convenient details at the same time.
GeoDB architecture has been proposed following a very similar approach. Our infrastructure is based on what we have termed as a Modular Blockchain Architecture, or MBA, which focuses on the interconnection and interoperability between different blockchains networks. Maybe you can think that it’s a convoluted solution, but believe us, it’s the best solution and at the same time, the simplest. Let us explain it to you.
Interconnection is the key
The term blockchain isn’t the most appropriate to describe the technology to which it refers, being Distributed Ledger Technology, or DLT, a more precise definition . However, the irruption of blockchain as the first DLT led us to use the term blockchain to name somethings that really aren’t blockchains.
Something similar occurs with respect to the mother of all the blockchains, Bitcoin. Due to its advent as the first blockchain, the Bitcoin network has a much greater diffusion than the rest of blockchain networks, and its cryptoactive, the bitcoin, a much higher value than other cryptoactives.
The spectacular rise in the price of bitcoin as well as the rest of cryptoactives has led to meaningless discussions about witch blockchain network will prevail over the rest, something reflected in the used of the term altcoin.
These are sterile discussions because they assume the hypothesis that there will be a blockchain network that will triumph over the rest. This conjecture only reflects ignorance about blockchain technology.
As soon as we analyze how blockchain technology works and what it allows to do, we’ll realize that it’s a nonsense to propose a unique blockchain network to cover the needs of all types of users and entities. Are the needs of a bank identical to those of an insurance company or those of the social security of a country? Should the data be stored in the same way and be accessible under the same scheme in all cases?
Obviously the answer to the previous two questions is a resounding no in both cases, and this is the cause of the use of an increasingly rich vocabulary to talk about types of blockchains networks.
In the future, there will be multiple blockchains networks, and each of them will cover the needs of a specific sector and therefore they will be designed with unique characteristics that will allow them to provide very specific features. Without going any further, today we already have a great heterogeneity. We have blockchains networks in which all information is public and others in which the information is encrypted. Blockchains in which anyone can participate publicly and others in which the access is restricted. Blockchains that use smart contracts as opposed to others in which it’s only necessary to store data. With consensus algorithms based on PoW or PoS. And so on.
We could extend the previous list extensively, but our goal here is not to establish a taxonomy but to illustrate the absurdity of thinking that a single blockchain network is the solution to all existing problems. As in so many other cases, there are no silver bullets here.
We must be aware that this heterogeneity is not a problem, but the opposite, since we can use customized blockchains networks to cover our needs in an optimal way. This scenario of multiple blockchains has led to the creation of gateway services, which allow us to interact in several blockchains networks simultaneously.
We are convinced that this interconnection will prevail in the coming years as it fosters the emergence of high-value synergies: i) blockchains with health data accessed partially by the blockchains of insurance companies, ii) blockchains with sports results accessed by the blockchains of sports betting companies and these blockchains accessed by the blockchain of the tax agency of a country, iii) blockchain of geo-positioning data accessed by the blockchain of a research institute and this blockchain accessed by the blockchain of a verification agency, and we could continue with a long etcetera.
When we proposed GeoDB, we asked ourselves the question: what is the most appropriate blockchain architecture for our proposal? We want to store data under a big data paradigm, facilitate the sale to users and the purchase to customers. But we also want to facilitate the development of novel applications on our infrastructure.
Our solution has been the proposal of a modular blockchain architecture, which we have conceived as an architecture with modular nodes that can interact at the same time in multiple blockchains under different roles.
Our architecture offers great advantages, among which we highlight:
- Our infrastructure is scalable and adaptable to multiple types of needs.
- The modification of the infrastructure is simple and can be optimized in each case.
- The participation of a node can be evaluated according to different parameters, so it can obtain greater rewards.
Our concept of modularity
Perhaps, after reading the previous section, you’ve clear what kind of architecture we want to propose, but we know that the term modularity used in a technological context can be ambiguous. When we talk about modularity, our thoughts can go in several directions :
- From a design perspective, we can see it as a software technique to separate the functionality of a programme into independent parts that each contains everything necessary to execute only one aspect of the desired functionality .
- From a develop perspective, we can see it as an approach that subdivides a system into smaller parts that can be independently created and then used in different systems .
- From an infrastructure perspective, we can see it as an expandable architecture in which the system are expandable through modules that provide them with a specific role with which they can participate in our system or in other systems.
The latter is the one we refer to. If you do not end up understanding it, the following diagrams will clarify your doubts. Let’s suppose that this is a blockchain.
Multiple nodes can participate in a blockchain network, each with a local copy, complete or not, of the blockchain.
The nodes are connected, usually under a P2P network topology, forming a blockchain network.
The nodes that participate in the blockchain network follow a consensus algorithm that allows them to agree on how to add blocks to the blockchain and what is the correct blockchain version.
There are different blockchains networks, with more or less similarities between them.
Each of these networks has a purpose, so they use different consensus algorithms, different types of blocks and different access policies.
In many occasions, the data of a blockchain network can be useful in another network. In these cases, gateways capable of interacting in both networks can be implemented.
The implementation of a gateway varies in each case and is far from being a trivial process.
Given that the previous situation is not exceptional but recurrent, we propose an expandable blockchain architecture in which the nodes are expandable through modules that provide them with a specific role with which they can participate in one or more blockchain networks.
Part 5. Measuring size and cost
“There are three kinds of lies: lies, damned lies, and statistics”
Attributed to British Prime Minister Benjamin Disraeli 
Everything said so far is nothing but romanticism: i) huge amounts of money are being invested in the field of big data, ii) locations are very valuable for big data analysis, iii) users should have control over their private information and should receive adequate compensation for providing it, iv) blockchain technology allows us to guarantee the immutability of the information and therefore improve the quality of the data for the analysis, and many other beautiful ideas.
But there is a question that any critical reader should ask himself, does all this make any sense as a whole? That is, is it reasonable to propose a big data scenario with private location data? and in such case, is it possible to use current big data technology to guarantee immutability or other properties in the data?
It’s commonly said that on paper everything is possible. With little effort, we can use any data to support any idea. As benjamin Disraeli said, statistics can be a type of lies .
In this part we’re going to show you the results of some studies that we’ve carried out about the estimated size of our big data and about the cost of storing this information using blockchain technology.
We know that it’s inevitable to introduce certain bias in the exposed results due to i) our experience in the private location field and ii) the novelty of the blockchain paradigm, but we’ll do our best to maintain an objective position that allows you to obtain your own conclusions.
How big is our big data
Maybe after reading the previous parts, someone may doubt whether this is indeed a big data scenario or only one with a lot of data. Let’s take a closer look.
Suppose that an app uses a SDK provided by GeoDB, allowing their users to transfer their locations in exchange for an economic reward. In this app, each position (located pin point) will be composed of seven 8-byte fields or 52 bytes per position: i) latitude, ii) longitude, iii) timestamp and iv) four other fields to store data about the state of the mobile network or the device.
The app runs in the background and captures a position every 3 minutes for a total of 16 hours (assuming the users turns off their phones at night). With this configuration, for each user 320 locations (16 * 60/3) per day will be generated, or what is the same, 17.920 bytes or 17,5 KiloBytes (KB).
After processing the positions to eliminate outliers let’s suppose that the size is reduced by 10%, so the size of the user’s locations will be 15,75 KB. To this set of locations is added some semantic data about the user such as mobile model, age range or gender. In total we suppose an addition of 0,5 KB, that leaves a final value of 16,25 KB per user per day in this example.
Assuming that the above functionality is used by 1.000 users of the app, each day the app will generate 15,87 MegaBytes (MB) with locations. In addition, we estimate that the SDK will generate an additional 5% of information to store statistical and semantic information necessary to carry out big data analysis. So, the final size will be 16,62 MB per day.
In the long-term, for every 1.000 users, 5,94 GigaBytes (GB) of data would be generated in a year, 29,7 GB in 5 years and 124,73 GB in 21 years. But 1.000 users is a toy sample and, you know what? the current (Q2 2018) size of bitcoin blockchain after eight years is 169,12 GB . Assuming a more plausible number of users in a range of 1.000.000 to 100.000.000, the values would be the following:
1M of users after:
- 1 year: 5.939,3 GB
- 5 years: 29.696,52 GB
- 21 years: 124.725,4 GB
100M of users after:
- 1 year: 59.3930,48 GB
- 5 years: 2.969.652,41 GB
- 21 years: 12.472.540,14 GB
So you can draw your own conclusions, 12.472.540,14 GB are 12.180,21 TeraBytes (TB) or 11,89 PetaBytes (PB), and according to a study carry out by amazon in 2012 , the price of a big data infrastructure of this size would cost $184.108.40.2069,12 per year.
The cost of storing in blockchain
We propose an infrastructure for the storage and commercialization of data under a big data paradigm using blockchain so that users, as information generators, and entities, as customers seeking to obtain large volumes of high quality data, can obtain benefits.
But how much does it cost to store a single GB in a public blockchain? A huge amount of money, let’s see why.
Taking as a reference the first week of July 2017 and the first week of July 2018, the cost of persisting a GB of information in the blockchains  of Bitcoin (BTC), Ethereum (ETH) and Stellar (XLM) compared  to the cost of storing the same information in traditional Hard Drives (HD) disks or in the newest and most efficient Solid State Drive (SSD) disks are:
Cost per GB in July 2017
- BTC: 22.766.250,000$
- ETH: 4.672.500,000$
- XLM: 3.166.229,000$
- HD: 0,025$
- SSD: 1,010$
Cost per GB in July 2018
- BTC: 57.909.998,000$
- ETH: 7.716.975,000$
- XLM: 31.662.297,000$
- HD: 0,023$
- SSD: 0,75$
The following graph clearly reflects the cost and the current trend.
Let’s put this data in context.
A current smartphone like the iPhone X has 64GB  of storage in its basic model. Storing 64GB of data in the blockchain of Bitcoin would cost 3.706.239.872$, slightly less than the annual GDP  of a country of 7.000.000 inhabitants like Sierra Leone .
In view of this, is it reasonable to store big data using blockchain technologies? To answer affirmatively it is only necessary to think about how blockchain works.
If we consider that, we can follow an hybrid proposal in which we use:
- Blockchain technology to store data and,
- Popular blockchains to store the resumes of the blocks, a minimum part of the information, to guarantee the immutability and authenticity of the data with the same level of security as if all the data were stored in these blockchains
Following this approach and using a blockchain in which the cost of the transactions is used to cover the cost of the necessary disk space for (i) and a popular blockchain like Ethereum for (ii), it would be possible to store a lot of information for very little money.
Obviously, the previous costs should not be considered as the final costs since in a real infrastructure there are other costs such as electricity, interconnection or redundancy. However, we think that they clearly reflect that the cost of blockchain for storage is not an issue for our proposal if we follow a similar approach.
This hybrid approach may sound very good on paper, but you may be wondering if an interconnection between several blockchains will not end up leading to other costs. Like so many other things in life, everything depends on the point of reference, or in our case, on how we carry out the interconnection. Our next part will be about it.
Part 6. Interconnection
“It isn’t that they can’t see the solution. It is that they can’t see the problem”
G.K. Chesterton, 1935 
In our last part we talked about a hybrid approach in which we propose to store certain information using blockchain technology and other information in a public blockchain in order to minimize costs and maximize security. In our opinion, this scheme offers an optimal solution for several key aspects of our domain such as storage cost, scalability, security or immutability among many others.
We can not ignore that currently there are other proposals which, using blockchain technology, aim to reward users for providing their private information, each of them following a different architecture. Therefore, to clearly understand the motivation behind our architectural design, it’s convenient to reflect on the domain in which GeoDB is defined.
GeoDB is a proposal conceived for the commercialization of private locations under a big data paradigm. The big data market has an economic value of billions of dollars ($125.000.000.000 in 2015 ), but we must understand that the big data paradigm is not related to the individualized sale of user data. Our architecture has been conceived to:
- Make available the private location information of million of users . A big data query is not about the behavior of an individual, but of a large number of them.
- Guarantee the integrity and immutability of terabytes of information . Do you know the size of a public blockchain? In the case of Bitcoin, its current size after eight years is 169.12 GB . The amount of location information generated by only 10.000.000 users on a daily basis is almost the same.
- Resolve complex queries in this volume of information in order to obtain relevant information.
Due to the above, we believe that the only way to build GeoDB today is under a hybrid architecture. We must be clear that a hybrid solution is not a magic solution, since it’s well known that the interconnection of two isolated components could be even more complex than the creation of the components themselves. In addition, our scenario has an additional problem, we need to write additional information in a public blockchain and this is very expensive.
The question at this point is, is it possible to do this at a reasonable cost? Following a traditional approach, it’s not, but we should emphasize the word traditional. To paraphrase G.K. Chesterton , maybe it isn’t that we can’t see the solution, it’s that we can’t see the real problem.
Our technological stack
Broadly speaking, we can say that we propose a hybrid architecture based on blockchain technologies in which will use:
- An ERC-20 token to manage the economic value of the locations.
- Open source blockchain technology for our infrastructure.
Why an ERC-20 token?
ERC-20 is the de-facto standard for the definition of tokens. It’s a type of token defined in Ethereum blockchain and it is just a coincidence that it is called in this way, where ERC stands for Ethereum Request for Comment, and 20 is the number that was assigned to this request .
Defining a token using this standard is a guarantee for us and for the users of GeoDB, due to the wide use of Ethereum and the existence of multiple popular services adapted to use it .
Nobody should be surprised by the fact that today, more than 100.000 tokens are defined as ERC-20 tokens .
Why open source blockchain technology?
We believe that it is unnecessary to reinvent the wheel. There are proven technological solutions with which we can provide many of the elements that are necessary for the infrastructure that we want to build.
Currently, in CoinMarketCap there are 838 coins associated with full blockchain implementations , and you know what? almost all of them are open source. But blockchain open source solutions do not end here. More and more companies are promoting open source projects for the development of blockchain frameworks with which it is possible to deploy adapted blockchain solutions. Have you heard about Corda , HyperLedger , BigChainDB , OpenChain  or MultiChain ? If you don’t know any of them, be prepared to experience the Baader-Meinhof phenomenon .
Deploy a smart contract to define an ERC-20 token is something extremely simple . Deploy a smart contract without security problems is a bit more complicated . Before using smart contracts in Ethereum it’s necessary to consider two critical points:
- Writing in ethereum is, and always will be, expensive. The storage space in a blockchain like this should be considered as a precious resource, so it’s necessary to establish high prices so that it isn’t wasted.
- The code is law. Once a smart contract has been deployed it can not be modified. You can only modify the behavior of a deployed smart contract if you’ve developed it to allow the desired change. Obviously, any poorly designed mechanism opens a security gap that can be exploited by third parties and no one can stop them .
One of the main characteristics of ethereum smart contracts is that they do not allow the execution of non-deterministic code. Among many other things, this implies that only information that exists in ethereum can be used when they’re executed.
Taking into consideration the above and focusing on our proposal, the scenario sets as follows:
- We use an ERC-20 token that makes it possible to fairly reward the participants.
- The rewards are assigned based on the participation in the GeoDB big data ledger.
- To assign the rewards in ethereum, it’s necessary to transfer the necessary information from the GeoDB ledger to ethereum blockchain. For this it’s necessary to consider that:
- Ethereum storage is expensive, so the amount of data to write must be minimal.
- The transaction that triggers the reward assignment must be secure. Otherwise, anyone (users, nodes or any others), could request reward without being worthy of it.
How can we combine all this? Our approach is to use a paradigm that we call request-approval-justice.
In first place, a smart contract will be deployed in ethereum to define an ERC-20 token. This contract will be designed to allow:
- Set the rules that regulate the supply.
- The assignment of tokens to specific addresses.
- Reclaim tokens for a given address.
Another smart contract will be deployed in GeoDB big data ledger to account the tokens that can be claimed by each participant based on their participation. Under this paradigm, each address will have two token balances, i) tokens in ethereum that we can transfer and ii) claimable tokens in GeoDB big data ledger that we can not move but that we can claim to be assigned to us in ethereum.
Additionally, there will be a set of nodes that, using a PoS mechanism  to guarantee their correct behavior, will write the resumes of the blocks into ethereum blockchain when the blocks have reached a given depth. These resumes will be used as references for the execution of ethereum smart contracts as we will explain later.
Storing a resume in ethereum currently costs slightly less than half a dollar. Considering that 288 blocks are created daily if a block is created every five minutes, the cost of storing the resumes of all the blocks would be $144 per day or $52.260 per year. However, this would be the cost of storing all the resumes, which is not absolutely necessary nor saffer. For example, by storing a resume every hour, the daily cost would be $12 or $4.380 per year.
As we’ll see, to apply ‘justice’ the tokens are not rewarded at the moment, so the number of resumes to be saved could be only one every several days without any problem.
So far we have:
- A smart contract in ethereum to assign and move tokens.
- A smart contact in GeoDB big data ledger to assign rewards and reclaim them.
- A set of nodes using a PoS mechanism to transfer GeoDB big data ledger blocks’ resumes to ethereum.
In this infrastructure, the first phase, Request, can be executed. In a request, a user claims the corresponding tokens in ethereum using the GeoDB ledger. A node that acts as a notary will verify that everything is correct and in that case, it’ll generate a signed transaction in GeoDB ledger that proves this. When the next resume of GeoDB ledger is transferred to ethereum, the node generates a new transaction in ethereum specifying:
- The amount of tokens to be transferred to a given address.
- The hash of the transaction in GeoDB ledger that proves that those tokens must be transferred to the address (1).
- The first GeoDB’s block resume stored in ethereum network which is generated after (2).
Approval is not a phase but a state in the process. While request involves the creation of several transactions, specifically 2 in GeoDB ledger, claim and verification, and 2 in ethereum, block’s resume and reward order, an approved transaction is only a reward order made from an account with sufficient balance for the justice phase.
To apply justice we follow the presumption of innocence, i.e, everyone is innocent unless proven guilty. Under this approach, every approved order transaction is automatically considered valid once a possible crime has prescribed. The appropriate time in which a possible crime has been prescribed will be analyzed and fixed before the launch of our mainnet.
During the justice phase, anyone can check if any approved transaction is legitimate or not, for which it’s enough to check if the verification transaction, which is signed by the node that acted as notary, is correct or not. The proof of an illegitimate order transfers the notary’s deposit to the accusation and blocks the assignment of the tokens.
We still have to carry out several experiments to refine and optimize our paradigm, but in general terms we can see how it allows us to minimize the interconnection costs. As it was initially indicated, to find the solution sometimes it’s only necessary to rethink the problem. In our case, the problem is not how to communicate, but what is communicated and who makes the communication.
Part 7. The GEO token
“Three eras of currencies: commodity based, government based, and math based.”
Chris Dixon, 2015 
When we talk about creating a new token we’re aware that many readers will ask themselves, is this necessary? This doubt will usually come to those who observe the blockchain technology from a competitive point of view, many of whom consider that a fierce battle is currently being waged among hundreds of cryptoactives until, finally, only a handful of them will be victorious.
For us, the reality is a little less romantic, although no less epic.
As we’ve indicated in previous parts, for us there are no silver bullets in this area. Our vision is that blockchain technology is destined to transform our social and commercial relationships in a myriad of areas. Currently even the most visionary can not imagine everything that it’ll allow us to do.
Nobody knows in what direction this technology will evolve in the next decades. There’s only agreement at one point among all the experts in the area, it’s here to stay. We know that many people have earned a lot of money thanks to being pioneers in the field. Whether due to technological ignorance or speculative interests, many of them still defend the competitive vision among solutions, but gradually it’s reaching a consensus situation under which the discourse is that the coexistence of blockchain solutions adapted to different scenarios will be common in the future. It’s just common sense. The real world is heterogenous, each problem is unique, and each of them requires a customized solution.
The irruption of blockchain technology has brought about a paradigm shift in how to approach economic relations. Frankly, is anyone able to imagine a better mechanism for the transmission of economic value than one built under a decentralized trust scheme? While for a time many people have not been able to see beyond the pure economic utility of the technology, in recent years the collective vision has been transformed to the point that today, the most common discourse is that blockchain technology is much more than a tool for the transmission of the economic value.
Blockchain technology is a tool to build solutions in scenarios in which the decentralization of trust allows the emergence of new possibilities. For example, from the perspective of the use of private location information for big data analysis, the use of this technology makes it possible to build a solution that guarantees that i) the locations were captured by whoever claims to have captured them, ii) the locations were captured at a given moment, iii) history isn’t mutable or iv) the results are verifiable. A scenario of this type, which is built on the pillars of trust and decentralization is, in addition, the best option for the transmission of the economic value of the commercial operations that occur within it.
In a world in which specific blockchains solutions coexist, some of them designed in scenarios in which, in addition, the same blockchain solution is the best way to transmit the economic value in the scenario, it may be convenient or even necessary to define a digital asset to capture and facilitate the transmission of the economic value in the commercial operations carried out in it.
As we’ve indicated above, we don’t support the opinion that there’s a competition in which a cryptoactive will triumph over the rest, except obviously in those cases in which there are several cryptoactives created to solve an identical problem. For us, these digital assets are the ideal mechanisms to transfer the economic value in the commercial operations carried out in a scenario, and their total capitalization will go in proportion to the market size in which they’ve been defined.
From this perspective, we propose the definition of a specific token for GeoDB, the GEO token.
GeoDB is building a structure in which the exchange of value is based on the creation of its own token, the GEO token. GEO token represents the market value of big data in GeoDB’s marketplace. To achieve this it’s vital to accurately design two aspects, the i) cost of data acquisition and the ii) incentive system.
Cost of data acquisition
We must carefully model the cost of data acquisition in order to maintain an equilibrium between benefits for users and costs for buyers at all times. We believe that the best way to do this is to link the cost of data acquisition to the diffusion rate expected for GeoDB, establishing reduced prices during first years to encourage purchases and progressively increasing them until reaching the equilibrium point.
To estimate the diffusion of GeoDB we embrace the theory of the diffusion of innovations . Professor Everett Rogers  popularized in his book of 1962, Diffusion of Innovations, the theory with the same name which seeks to explain how, why, and at what rate new ideas and technology spread.
The concept of diffusion on which Everett’s theory is developed was studied in 1890 by Gabriel Tarde in The Laws of Imitation . He identifies 3 main stages through which innovations spread: 1) Difficult beginnings, during which the idea has to struggle within a hostile environment; 2) Exponential take-off of the idea; 3) Logarithmic stage, corresponds to the time when the impulse of the idea gradually slows down while, simultaneously new opponent ideas appear. The ensuing situation stabilizes the progress of the innovation, which approaches an asymptote . This diffusion model is usually modeled using sigmoidal functions , also known as S-curves, which are widely used in fields such as artificial neural networks, biology, biomathematics, chemistry, demography, economics, geoscience, mathematical psychology, probability, sociology, political science, linguistics, and statistics.
The cost curve that we have defined for GeoDB will be the following:
- M, the maximum cost.
- m_0, the minimum cost.
- b, the depth of the current block in GeoDB big data ledger.
- B, the number of blocks to generate before reaching M.
- f_c, an adjustment factor of the steepness of the curve.
We have established the following values:
- M = 10.000.
- m_0 = 1.
- B = 2.207.520.
- f_c = 13.
So we can substitute in the previous function to obtain the cost function of GeoDB:
Its graphic representation can be seen below:
We’ve talked about the cost in a generic way without specifying the units used to measure it. Our approach is to combine several aspects such as the amount of data, the number of blocks explored or the complexity of the query, to generate a cost metric that we’ll call Proof-of-Analytics, or PoA, as a clear nod to other similar ideas in this area such as PoW or PoS. PoA must be a verificable value obtained from the aggregation of a demonstrable amount of analysis work. It’ll be used in a cost function in which this value, the current diffusion of the protocol, and other intrinsic factors of the data such as its origin or its demand, will allow to set the price for the dataset.
The functions to calculate PoA and cost will be proposed in the technical white paper of GeoDB protocol, which will be available in a couple of months.
Taking into account the origin of the data or its demand to compute the cost of a dataset allows us to simultaneously create the mechanisms to reward users depending on the expected demand for their data and thus maximize the heterogeneity of the information.
The correct setting up of the incentive system is one of the most delicate issues for the economic sustainability of GeoDB’s protocol in the long-term. Any incentive distribution could be adequate in the first days of protocol operation, in which there will be an abundance of resources for all, but an inequitable distribution will propitiate that those participants who feel that the distribution is not fair, turn their backs on the project, which would be disastrous for its sustainability. Being aware of this, we have designed an equitable incentive system adjusted to the technological penetration that we expected in order to guarantee the interests of all the parties involved.
We’ve built a fixed token supply curve based on time, that establishes reward token process over the first 21 years of the system operations from preassigned 300.000.000 GEO tokens to our complete supply of 1.000.000.000 GEO tokens. Further data and our complete model can be accessed upon request.
To distribute the rewards we have defined a decremental logarithmic model. This model will be repeated in cycles of 21 years at the rate of a block every five minutes, giving a total of 2.207.520 blocks per cycle.
The cumulative reward curve that we have defined for GeoDB will be the following:
- T, the number of tokens to be rewarded.
- b, the depth of the current block in GeoDB big data ledger.
- B, the number of blocks to generate before reaching M.
- f_r, an adjustment factor of the steepness of the curve.
We have established the following values:
- T = 700.000.000.
- B = 2.207.520.
- f_r = 10.
So we can substitute in the previous function to obtain the cumulative reward function of GeoDB:
Its graphic representation can be seen below:
And with this we conclude our GeoDB discovery post series.
- Danezis, G. et al. How much is location privacy worth? In WEIS, volume 5. Citeseer, 2005.
- Cvrcek, D. et al. A study on the value of location privacy. In Proceedings of the 5th ACM workshop on Privacy in electronic society, pages 109–118. ACM, 2006.
- Staiano, J. et al. Money Walks: A Human-Centric Study on the Economics of Personal Mobile Data, 2014 ACM Conference on Ubiquitous Computing, pp. 583–594. ACM, Seattle. https://arxiv.org/pdf/1407.0566.pdf
- The Power of Place: How Location Intelligence Reveals Opportunity in Big Data.
- The eureka moment: Location intelligence and competitive insight.
- The Best of All Possible Worlds: Mathematics and Destiny. Ivar Ekeland, 2006. ISBN: 9780226199948
- Quote my everyday, Santosh Kalwar, 2010. ISBN: 9781446118634
- Rogers, Everett (16 August 2003). Diffusion of Innovations, 5th Edition. Simon and Schuster. ISBN 978–0–7432–5823–4