As a data engineer or analyst, I often work with various types of data. One of the most common tasks is data acquisition, which is unavoidable in any project. In this post, I will share a data connection configuration model that I use regularly for my data analysis tasks.
**MySQL Database**
MySQL is one of the most widely used relational databases. In my daily work, I mainly focus on reading and writing data, while deletion and update operations are typically handled by developers. Therefore, I haven’t included those functions in my code.
To connect to MySQL, I use the `MySQLdb` library along with `pandas` for data manipulation and `sqlalchemy` for creating database connections. Here's an example of a connection class:
```python
import MySQLdb
import pandas as pd
from sqlalchemy import create_engine
class ConAnalyze:
"""Data Analysis Platform Connection"""
def __init__(self, database='myanalyze'):
self.database = database
self.conn = None
def connect(self):
self.conn = MySQLdb.connect(host='***', user='root', passwd='***', db=self.database, charset='utf8')
def query(self, sql):
try:
self.connect()
data = pd.read_sql(sql, self.conn)
except (AttributeError, MySQLdb.OperationalError):
self.connect()
data = pd.read_sql(sql, self.conn) # Reconnect if there's an error
return data
def store(self, mydataframe, table_name, if_exists='replace'):
conn2 = "mysql+mysqldb://root:***@***:3306/%s" % self.database
local_engine = create_engine(conn2)
mydataframe.to_sql(table_name, local_engine, if_exists=if_exists, index=False, chunksize=10000)
```
This class allows you to query and store data efficiently. The `query()` method handles reconnections in case of errors, and the `store()` method uses `to_sql` from pandas to write data into the database. If your dataframe is large, the `chunksize` parameter helps manage memory usage by processing data in smaller batches.
I also recommend using the `tenacity` library for more robust retry logic in your queries.
**MongoDB**
MongoDB is a NoSQL database that stores data in a JSON-like format. When working with MongoDB, it’s common to retrieve data using Python’s `pymongo` library. Here’s a simple example of how to connect and query data:
```python
import pymongo
import pandas as pd
class Conn_Mongo:
"""MongoDB Connection"""
def __init__(self):
self.mongo_utoken = pymongo.MongoClient('mongodb://***:27000').utoken # User Table
def get_user_data_mongo(self, list_id):
user_data = pd.DataFrame(list(self.mongo_utoken.userinfo.find({'FToken': {'$in': list(list_id)}})))
return user_data
```
In this example, we query MongoDB using a list of IDs and convert the result into a pandas DataFrame for further analysis.
**Flurry API Integration**
If your work involves mobile app analytics, Flurry is a powerful tool for collecting and analyzing user behavior. It provides APIs that allow you to fetch data programmatically.
To use Flurry, you first need to obtain an **App Access Token**, which you can find on the Flurry dashboard. Once you have the token, you can build URLs to request specific data.
Here’s an example of a Flurry API class:
```python
import pandas as pd
import json
import requests
class Conn_Flurry:
"""Flurry API Data"""
api_token = "******.****.****"
headers = {'Authorization': 'Bearer {}'.format(api_token)}
def get_results(self, url="https://api-metrics.flurry.com/public/v1/data/appEvent/day/app?metrics=activeDevices,newDevices,averageTimePerDevice&dateTime=2017-05-23/2017-05-24"):
data = requests.get(url, headers=self.headers)
cleaned = json.loads(data.text, 'utf-8')
cleaned = pd.DataFrame(cleaned['rows'])
return cleaned
def get_url(self, table='appEvent', timegrain='day', dimensions='app/event', metrics='occurrences', dateTime='2017-09-23/2017-05-24', filters=""):
endpoint = "https://api-metrics.flurry.com/public/v1/data"
url = "{}/{}/{}/{}?metrics={}&dateTime={}&filters={}".format(endpoint, table, timegrain, dimensions, metrics, dateTime, filters)
return url
```
This class includes two main methods: `get_url()` to construct the correct API URL based on your requirements, and `get_results()` to fetch and process the data returned by the API.
When working with Flurry, be mindful of date ranges and formatting. For instance, a date range like `'2017-09/2017-10'` includes all data from September but excludes October. This detail is crucial when calculating metrics over specific periods.
**Conclusion**
Whether you're working with SQL databases, NoSQL databases, or third-party platforms like Flurry, having a consistent and well-documented connection model is essential. By encapsulating these interactions into reusable classes, you can streamline your workflow, reduce errors, and make your code more maintainable.
Feel free to adapt these examples to fit your specific needs. If you have any questions or need further clarification, leave a comment below—I’ll do my best to help!
Ethylene-Propylene-Diene Monomer
EPDM Rubber Cold Shrink Tube is a kind of equipment used on power and communication cables indoor, outdoor, overhead, in water or buried.
EPDM rubber is a terpolymer of ethylene, propylene and non-conjugated diolefin.EPDM rubber has excellent mechanical properties, puncture resistance and high tear resistance, weather resistance, ultraviolet resistance, ozone aging resistance, acid and alkali resistance, salt spray corrosion resistance, resistance to high and low temperatures of up to -55 ℃ ~ +150 ℃, it is the ideal sealing material for communication cables, coaxial cables, and medium and low-voltage power cables.
Ethylene-Propylene-DieneMonomer,EPDM cold shrink tube,Cold Shrink Tube,Cold Shrinkable tubing,Cold-shrink tube,EPDM
Mianyang Dongyao New Material Co. , https://www.mydyxc.com