Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 88 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,88 @@
# in progress, do not fork yet!
# DM Weibo

Logan Clark | Shengye Guo | Zhiyuan Han | Yuhan Ke | Emily Kerns Minougou | Sinae Lee | Jialei Wu

## Introduction

This MVP (minimum viable products) examines the patterns of check-ins in restaurants in Shenzhen for each day of the week, time of day (sorted by morning, day, evening, and night), and density of the check-ins. It also looks at the distribution of the check-ins, such as trying to detect whether there are clusters of restaurants.

## Research Scope

The area this project focuses on is downtown Shenzhen (Futian District and Luohu District).

## Hypothesis

It is hypothesised that there is a difference in the pattern of restaurant check-ins between the weekdays and weekends as well as day and night. The differences may help detect where activity is taking place in the city and different times and may show how people move throughout the city during the day and night and the week and weekend.

## Methodology

This product explores checkins using the Weibo database on Orient DB. A query was setup to select all the “Food” category checkins and based on the time the check in occurred, it is then categorized as morning, day, evening, or night. By using the MVP, a user can select a day of the week and time of day they would like to study. They may also examine the density of checkins by selecting the optional heatmap. The user may toggle between different days and times of day to see the differences between checkins.

## Minimum Viable Product

In order to create the MVP the following items were assumed:
- People may eat in the downtown business area during the day
- At night, the checkins may spread out from the downtown since people may be eating elsewhere
- People are mostly checking in to restaurants and not checking into the their homes or office so “Food” was selected as the checkin category
- People may eat at restaurants and check in using Weibo often enough to generate enough data to analyze the region
- Checkins on the weekends may be different from checkin patterns during the week

Ways to extend the MVP
- Create an animation to show how the checkins move throughout the day in a quick format
- Figure out how to query data more quickly
- Show two or more queries of time of day and day of week at one time to show comparison

## Data Processing

Dataset Used

Weibo database

Describe in detail any processing you did on the data to prepare it for your application

In order to query the “Food” checkins and time of day, a category needed to be created for the type of checkin and a separate checkin needed to be created for time of day. This was queried from the time category and time of day was separated based on the following time periods:
5am - 11am: Morning
11am - 4pm: Day
4pm - 9pm: Evening
9pm - 5am: Nigh

## Server Back End

Arguments that are sent from the client and received by the server

The arguments are given by the inputs of the client interface. As the user zooms to a particular portion of the map, the new bounding latitudinal and longitudinal coordinates are generated. The other options checked by the user on the interface form the basis of the query that is passed to the database through the flask. The server then passes the results back to the client side.

Update messages

Our update messages show the number of records returned by the query. The user is then informed that the records are being matched. Finally the user is informed when the operation is complete and the application is idle.

Data sent back to the client at the end of the request

The data is sent back to the client through an SSE event stream using the Flask’s response command. The data is converted back into javascript and passed to the user interface.

## Server interacts with the database

Query that is sent to the database

The local host server receives the query and passes it through to the database using the app.py file. OrientDB then opens the appropriate database and parses through the correct properties.

Results of the query are processed and formatted for sending back to the client

The query results are passed back through the server and formatted into JSON to appear on the user interface.

## Client Front End

Front end User Interface (UI)

The user interface allows a person to view the Food checkins by different time periods and days. It includes seven days a week and is divided each day into “morning” “day” “evening”“night”

General User Experience (UX) story or narrative

A user may want view the patterns of “Food” checkins if they are planning on opening a restaurant, a new club, shop, or trendy place. They may analyze the patterns to understand where people are traveling throughout the city and from there they may make an informed decision where to open their new business. A city may use the data to plan a new project that they want to have a lot of publicity and high amount of visibility. By using the MVP, they can choose where to place the project based on the density of checkins. If they are looking to attract business people then they may use the query to see check ins on a weekday during the day.


Reproduce and explain in detail any requests you are sending to the server, including any arguments in the query string, and how they are communicating the decisions the user has made in the UI

Describe how the data is visualized using JavaScript/D3

We used different color of background for different time period. (i.e. Black for evening and night, white for morning and day). We also used heat map to convey the important information on clusters of restaurants that with check ins. The information will occurs when the mouse pointer is over the selected restaurants.
191 changes: 191 additions & 0 deletions app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
from flask import Flask
from flask import render_template
from flask import request
from flask import Response

import json
import time
import sys
import random
import math
import datetime

import pyorient

from Queue import Queue

from sklearn import preprocessing
from sklearn import svm

import numpy as np

app = Flask(__name__)

q = Queue()

def point_distance(x1, y1, x2, y2):
return ((x1-x2)**2.0 + (y1-y2)**2.0)**(0.5)

def remap(value, min1, max1, min2, max2):
return float(min2) + (float(value) - float(min1)) * (float(max2) - float(min2)) / (float(max1) - float(min1))

def normalizeArray(inputArray):
maxVal = 0
minVal = 100000000000

for j in range(len(inputArray)):
for i in range(len(inputArray[j])):
if inputArray[j][i] > maxVal:
maxVal = inputArray[j][i]
if inputArray[j][i] < minVal:
minVal = inputArray[j][i]

for j in range(len(inputArray)):
for i in range(len(inputArray[j])):
inputArray[j][i] = remap(inputArray[j][i], minVal, maxVal, 0, 1)

return inputArray

def event_stream():
while True:
result = q.get()
yield 'data: %s\n\n' % str(result)

@app.route('/eventSource/')
def sse_source():
return Response(
event_stream(),
mimetype='text/event-stream')

@app.route("/")
def index():
return render_template("index.html")

@app.route("/getData/")
def getData():

q.put("starting data query...")

lat1 = str(request.args.get('lat1'))
lng1 = str(request.args.get('lng1'))
lat2 = str(request.args.get('lat2'))
lng2 = str(request.args.get('lng2'))

print request.args.get('w')

w = float(request.args.get('w'))
h = float(request.args.get('h'))
cell_size = float(request.args.get('cell_size'))

analysis = request.args.get('analysis')

# Setting up global variables that would be used in the query
# Getting "day of week" and "time of day" information selected from the dropdown menu on the client side
dropdownDay = request.args.get('dayOfWeek')
dropdownTime = request.args.get('timeOfDay')

print "received coordinates: [" + lat1 + ", " + lat2 + "], [" + lng1 + ", " + lng2 + "]"

client = pyorient.OrientDB("localhost", 2424)
session_id = client.connect("root", "michael2464")
db_name = "weibo"
db_username = "admin"
db_password = "admin"

if client.db_exists( db_name, pyorient.STORAGE_TYPE_MEMORY ):
client.db_open( db_name, db_username, db_password )
print db_name + " opened successfully"
else:
print "database [" + db_name + "] does not exist! session ending..."
sys.exit()

# Set the checkin category to "Food"
# Query data on the selected "day of week" and "time of day" from the weibo database
query = 'SELECT FROM Checkin WHERE lat BETWEEN {} AND {} AND lng BETWEEN {} AND {} AND cat_1 = "Food" AND DOW =' + str(dropdownDay) + ' AND TOD = ' + str(dropdownTime)

records = client.command(query.format(lat1, lat2, lng1, lng2))

numListings = len(records)

#print 'received ' + str(numListings) + ' records'
q.put('received ' + str(numListings) + ' records')


client.db_close()

output = {"type":"FeatureCollection","features":[],"time":[]}

for record in records:

feature = {"type":"Feature","properties":{},"geometry":{"type":"Point"}}
feature["id"] = record._rid
feature["properties"]["time"] = str(record.time)
feature["geometry"]["coordinates"] = [record.lat, record.lng]

dateTimeList = str(record.time).split()
date = dateTimeList[0]
dateList = date.split('-')
month = str(dateList[1])
day = int(dateList[2])

#calculating which day in the week
seven = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
feature["properties"]["day"] = seven[record.time.weekday()]

output["features"].append(feature)


if analysis == "false":
#q.put('idle')
return json.dumps(output)

#Calculating the heat map
q.put('starting analysis...')

output["analysis"] = []

numW = int(math.floor(w/cell_size))
numH = int(math.floor(h/cell_size))

grid = []

for j in range(numH):
grid.append([])
for i in range(numW):
grid[j].append(0)

for record in records:

pos_x = int(remap(record.lng, lng1, lng2, 0, numW))
pos_y = int(remap(record.lat, lat1, lat2, numH, 0))

spread = 12

for j in range(max(0, (pos_y-spread)), min(numH, (pos_y+spread))):
for i in range(max(0, (pos_x-spread)), min(numW, (pos_x+spread))):
grid[j][i] += 2 * math.exp((-point_distance(i,j,pos_x,pos_y)**2)/(2*(spread/2)**2))

grid = normalizeArray(grid)

offsetLeft = (w - numW * cell_size) / 2.0
offsetTop = (h - numH * cell_size) / 2.0

for j in range(numH):
for i in range(numW):
newItem = {}

newItem['x'] = offsetLeft + i*cell_size
newItem['y'] = offsetTop + j*cell_size
newItem['width'] = cell_size-1
newItem['height'] = cell_size-1
newItem['value'] = grid[j][i]

output["analysis"].append(newItem)


q.put('idle')
return json.dumps(output)


if __name__ == "__main__":
app.run(host='0.0.0.0',port=5000,debug=True,threaded=True)
Loading