yndx-metrika / logs_api_integration Goto Github PK
View Code? Open in Web Editor NEWScript for integration with Logs API
Script for integration with Logs API
Запускаю с полностью стандартными параметрами как в README. Судя по DATA SAMPLE данные скрипт получает успешно, но не может их разобрать. Что не так делаю? Спасибо!
('##### python', '2.7.15')
2021-02-19 23:30:00 MainProcess INFO CLI Options: Namespace(end_date=None, mode='regular', source='visits', start_date=None)
2021-02-19 23:30:00 MainProcess INFO UserRequest(token=u'xxxxxxx', counter_id=u'24169438', start_date_str='2021-02-17', end_date_str='2021-02-17', source='visits', fields=(u'ym:s:counterID', u'ym:s:dateTime', u'ym:s:date', u'ym:s:clientID'))
2021-02-19 23:30:01 MainProcess INFO ### CREATING TASK
{
"date1_str": "2021-02-17",
"date2_str": "2021-02-17",
"request_id": 15071596,
"status": "created",
"user_request": [
"xxxxxxxxx",
"24169438",
"2021-02-17",
"2021-02-17",
"visits",
[
"ym:s:counterID",
"ym:s:dateTime",
"ym:s:date",
"ym:s:clientID"
]
]
}
2021-02-19 23:30:02 MainProcess INFO ### DELAY 20 secs
2021-02-19 23:30:22 MainProcess INFO ### CHECKING STATUS
2021-02-19 23:30:22 MainProcess INFO API Request status: processed
2021-02-19 23:30:22 MainProcess INFO ### SAVING DATA
2021-02-19 23:30:22 MainProcess INFO Part #0
2021-02-19 23:30:23 MainProcess INFO ### DATA SAMPLE
2021-02-19 23:30:23 MainProcess INFO ym:s:clientID ym:s:counterID ym:s:date ym:s:dateTime
1613536812404233133 24169438 2021-02-17 2021-02-17 07:40:13
1613543552696180493 24169438 2021-02-17 2021-02-17 09:32:33
1613564820503177582 24169438 2021-02-17 2021-02-17 15:27:09
16135789602285303 24169438 2021-02-17 2021-02-17 19:22:40
2021-02-19 23:30:23 MainProcess WARNING 1 rows were filtered out
2021-02-19 23:30:23 MainProcess CRITICAL Iteration #1 failed
2021-02-19 23:30:23 MainProcess CRITICAL Code: 117, e.displayText() = DB::Exception: Unknown field found in TSV header: '1613536812404233133' at position 0
Set the 'input_format_skip_unknown_fields' parameter explicitly to ignore and proceed (version 21.2.3.15 (official build))
Доброго времени.
На страницах документации присутствуют поля, которых нет в ch_types.json
.
Полный список отсутствующих полей на данный момент:
ym:pv:GCLID, ym:pv:regionCityID, ym:pv:regionCountryID, ym:s:firstGCLID, ym:s:lastGCLID, ym:s:lastSignificantGCLID, ym:s:offlineCallFirstTimeCaller, ym:s:offlineCallHoldDuration, ym:s:offlineCallMissed, ym:s:offlineCallTag, ym:s:offlineCallTalkDuration, ym:s:offlineCallURL, ym:s:regionCityID, ym:s:regionCountryID
При выгрузке визитов, получил ошибку 'map' object is not subscriptable, находящуюся в 135 строке clickhouse.py.
Дополнительное обертывание в list на 132 строке
ch_fields = list(map(get_ch_field_name, fields))
вроде бы решает проблему, но получается, что из коробки текущая версия не работает.
Я делаю что-то не так, и такая ошибка не должна возникать?
Допустимое ли это решение? Так как взял первый вариант из Google, Python не родной язык.
Здравствуйте!
Возникает следующая ошибка:
user% python2 metrica_logs_api.py -mode history -source visits
('##### python', '2.7.18rc1')
2020-07-05 10:20:10 MainProcess INFO CLI Options: Namespace(end_date=None, mode='history', source='visits', start_date=None)
2020-07-05 10:20:11 MainProcess INFO UserRequest(token=u'AgAAAAAMTPnoKgyV17Kq854QFS3CpB-1PEnsYvL', counter_id=u'12345678', start_date_str=u'2019-09-06', end_date_str='2020-07-03', source='visits', fields=(u'ym:s:counterID', u'ym:s:dateTime', u'ym:s:date', u'ym:s:clientID'))
2020-07-05 10:20:12 MainProcess INFO ### CREATING TASK
{
"date1_str": "2019-09-06",
"date2_str": "2020-07-03",
"request_id": 9797572,
"status": "created",
"user_request": [
"AgAAAAAMTPnoKgyV17Kq854QFS3CpB-1PEnsYvL",
"12345678",
"2019-09-06",
"2020-07-03",
"visits",
[
"ym:s:counterID",
"ym:s:dateTime",
"ym:s:date",
"ym:s:clientID"
]
]
}
2020-07-05 10:20:12 MainProcess INFO ### DELAY 20 secs
2020-07-05 10:20:32 MainProcess INFO ### CHECKING STATUS
2020-07-05 10:20:33 MainProcess INFO API Request status: created
2020-07-05 10:20:33 MainProcess INFO ### DELAY 20 secs
2020-07-05 10:20:53 MainProcess INFO ### CHECKING STATUS
2020-07-05 10:20:53 MainProcess INFO API Request status: created
2020-07-05 10:20:53 MainProcess INFO ### DELAY 20 secs
2020-07-05 10:21:13 MainProcess INFO ### CHECKING STATUS
2020-07-05 10:21:14 MainProcess INFO API Request status: processed
2020-07-05 10:21:14 MainProcess INFO ### SAVING DATA
2020-07-05 10:21:14 MainProcess INFO Part #0
2020-07-05 10:21:14 MainProcess INFO ### DATA SAMPLE
2020-07-05 10:21:14 MainProcess INFO ym:s:clientID ym:s:counterID ym:s:date ym:s:dateTime
1568225575123456785 12345678 2019-09-18 2019-09-18 23:42:36
1584613993123456781 12345678 2020-03-19 2020-03-19 13:33:12
1584723853123456788 12345678 2020-03-20 2020-03-20 20:04:26
1584993995123456786 12345678 2020-05-08 2020-05-08 21:27:36
2020-07-05 10:21:14 MainProcess WARNING 1 rows were filtered out
2020-07-05 10:21:14 MainProcess INFO Table created
2020-07-05 10:21:14 MainProcess CRITICAL Iteration #1 failed
2020-07-05 10:21:14 MainProcess CRITICAL Code: 117, e.displayText() = DB::Exception: Unknown field found in TSV header: 'ym:s:clientID' at position 0
Set the 'input_format_skip_unknown_fields' parameter explicitly to ignore and proceed (version 20.5.2.7 (official build))
Traceback (most recent call last):
File "metrica_logs_api.py", line 128, in <module>
integrate_with_logs_api(config, user_request)
File "metrica_logs_api.py", line 108, in integrate_with_logs_api
raise e
ValueError: Code: 117, e.displayText() = DB::Exception: Unknown field found in TSV header: 'ym:s:clientID' at position 0
Set the 'input_format_skip_unknown_fields' parameter explicitly to ignore and proceed (version 20.5.2.7 (official build))
Содержимое файла configs/config.json
{
"token" : "AgAAAAAMTPnoKgyV17Kq854QFS3CpB-1PEnsYvL",
"counter_id": "12345678",
"disable_ssl_verification_for_clickhouse": 0,
"visits_fields": [
"ym:s:counterID",
"ym:s:dateTime",
"ym:s:date",
"ym:s:clientID"
],
"hits_fields": [
"ym:pv:counterID",
"ym:pv:dateTime",
"ym:pv:date",
"ym:pv:clientID"
],
"log_level": "INFO",
"retries": 1,
"retries_delay": 60,
"clickhouse": {
"host": "http://localhost:8123",
"user": "default",
"password": "123",
"visits_table": "visits_test",
"hits_table": "hits_test",
"database": "app_test"
}
}
Clickhouse
Connected to ClickHouse server v20.5.2.
:) show tables;
┌─name────────┐
│ visits_test │
└─────────────┘
Ok. 1 row in set. Elapsed: 0.005 sec. Processed: 0 rows, 0.0B (0 rows/s, 0.0B/s)
:) desc table visits_test;
┌─name──────┬─type─────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ ClientID │ UInt64 │ │ │ │ │ │
│ CounterID │ UInt32 │ │ │ │ │ │
│ Date │ Date │ │ │ │ │ │
│ DateTime │ DateTime │ │ │ │ │ │
└───────────┴──────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
Ok. 4 rows in set. Elapsed: 0.002 sec. Processed: 0 rows, 0.0B (0 rows/s, 0.0B/s)
В чём может быть проблема и куда копать дальше? Заранее спасибо.
Добрый день!
При попытке забрать данные по визитам после даты 2017-02-08 сталкиваюсь с ошибкой:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 4744-4749: ordinal not in range(128) Logged from file metrica_logs_api.py, line 106 Traceback (most recent call last): File "metrica_logs_api.py", line 128, in <module> integrate_with_logs_api(config, user_request) File "metrica_logs_api.py", line 108, in integrate_with_logs_api raise e ValueError: <exception str() failed>
Причем до этого все визиты выгружались без проблем. А последующие визиты уже стабильно выдают эту ошибку. Что касается конфигов - выгружаю абсолютно все поля.
Пробовал также выгружать визиты начиная с 2017-02-09 (на случай если сменились типы сохраняемых метрикой данных) в другую таблицу (не ту, где успешно собраны визиты до 2017-02-09), но ничего не вышло.
Пожалуйста, помогите.
Выгружал данные за год и выскочила ошибка. Теперь не понятно какие данные выгружены, какие нет.
2017-03-03 15:49:30 MainProcess INFO Part #6 2017-03-03 15:49:30 MainProcess INFO Starting new HTTPS connection (1): api-metrika.yandex.ru 2017-03-03 15:49:36 MainProcess CRITICAL Iteration #1 failed Traceback (most recent call last): File "metrica_logs_api.py", line 127, in <module> integrate_with_logs_api(config, user_request) File "metrica_logs_api.py", line 107, in integrate_with_logs_api raise e MemoryError
2017-03-03 15:25:05 MainProcess INFO ### DATA SAMPLE
2017-03-03 15:25:05 MainProcess INFO ym:s:browser ym:s:browserCountry ym:s:browserEngine ym:s:browserEngineVersion1 ym:s:browserEngineVersion2 ym:s:browserEngineVersion3 ym:s:browserEngineVersion4 ym:s:browserLanguage ym:s:browserMajorVersion ym:s:browserMinorVersion ym:s:clientID ym:s:clientTimeZone ym:s:counterID ym:s:date ym:s:dateTime ym:s:dateTimeUTC ym:s:deviceCategory ym:s:endURL ym:s:from ym:s:goalsDateTime ym:s:goalsID ym:s:goalsOrder ym:s:goalsPrice ym:s:goalsSerialNumber ym:s:ipAddress ym:s:isNewUser ym:s:mobilePhone ym:s:mobilePhoneModel ym:s:operatingSystem ym:s:operatingSystemRoot ym:s:physicalScreenHeight ym:s:physicalScreenWidth ym:s:refererym:s:regionCity ym:s:regionCountry ym:s:screenColors ym:s:screenFormat ym:s:screenHeight ym:s:screenOrientation ym:s:screenWidth ym:s:startURL ym:s:visitID ym:s:watchIDs ym:s:windowClientHeight ym:s:windowClientWidth
yandex_internet 0 0 0 0 0 30066 18 \00 0 240 421224 2013-02-11 2013-02-11 16:29:42 2013-02-11 16:29:42 1 [] [] [] [] [] 178.248.66.xxx 1 windows_xp windows 0 0Moscow Russia 32 37 768 2 1366 9678942258 [] 673 1345
Row 1:
Column 0, name: Browser, type: String, parsed text: "yandex_internet"
Column 1, name: BrowserCountry, type: String, parsed text: "0"
Column 2, name: BrowserEngine, type: String, parsed text:
Column 3, name: BrowserEngineVersion1, type: UInt16, parsed text: "0"
Column 4, name: BrowserEngineVersion2, type: UInt16, parsed text: "0"
Column 5, name: BrowserEngineVersion3, type: UInt16, parsed text: "0"
Column 6, name: BrowserEngineVersion4, type: UInt16, parsed text: "0"
Column 7, name: BrowserLanguage, type: String, parsed text: "30066"
Column 8, name: BrowserMajorVersion, type: UInt16, parsed text: "18"
Column 9, name: BrowserMinorVersion, type: UInt16, ERROR: text "000240" is not like UInt16
При попытке загрузить данные из метрики возникают проблемы с массивами строк, содержащими критические значения.
alueError: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected [ before: "['
загрузка выполняется скриптом
https://yandex.ru/dev/metrika/doc/api2/logs/clickhouse-integration-docpage/
При этом для того, чтобы он работал с критическими символами в него добавлено
reload(sys)
sys.setdefaultencoding('utf8')
Since recently API requires that oauth token is passed in headers, not just in url. And now every request to API fails with the message "user not authorized".
The problem can be easily fixed by passing oauth token in the request headers. E.g. r = requests.get(url, headers={'Authorization': 'OAuth ' + user_request.token})
Здравствуйте!
Возникает следующая ошибка:
File "/Users/dmitrij/Downloads/logs_api_integration-master/metrica_logs_api.py", line 124, in
user_request.source):
File "/Users/dmitrij/Downloads/logs_api_integration-master/clickhouse.py", line 162, in is_data_present
if not is_db_present():
File "/Users/dmitrij/Downloads/logs_api_integration-master/clickhouse.py", line 86, in is_db_present
return CH_DATABASE in get_dbs()
File "/Users/dmitrij/Downloads/logs_api_integration-master/clickhouse.py", line 76, in get_dbs
return get_clickhouse_data('SHOW DATABASES')
File "/Users/dmitrij/Downloads/logs_api_integration-master/clickhouse.py", line 28, in get_clickhouse_data
r = requests.post(host, data=query, verify=SSL_VERIFY)
File "/Users/dmitrij/anaconda3/lib/python3.7/site-packages/requests/api.py", line 116, in post
return request('post', url, data=data, json=json, **kwargs)
File "/Users/dmitrij/anaconda3/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/dmitrij/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/Users/dmitrij/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/Users/dmitrij/anaconda3/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8123): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1050bfa58>: Failed to establish a new connection: [Errno 61] Connection refused'))
При запуске скрипта metrica_logs_api.py выходила ошибка:
2019-09-09 13:24:29 MainProcess INFO Table created
2019-09-09 13:24:29 MainProcess CRITICAL Iteration #1 failed
2019-09-09 13:24:29 MainProcess CRITICAL 'map' object is not subscriptable
Traceback (most recent call last):
File "metrica_logs_api.py", line 128, in
integrate_with_logs_api(config, user_request)
File "metrica_logs_api.py", line 108, in integrate_with_logs_api
raise e
File "metrica_logs_api.py", line 100, in integrate_with_logs_api
logs_api.save_data(api_request, part)
File "C:\Users\Professional\OneDrive\rustam\clickhouse_pos-shop\logs_api_integration-master\logs_api.py", line 179, in save_data
output_data)
File "C:\Users\Professional\OneDrive\rustam\clickhouse_pos-shop\logs_api_integration-master\clickhouse.py", line 155, in save_data
create_table(source, fields)
File "C:\Users\Professional\OneDrive\rustam\clickhouse_pos-shop\logs_api_integration-master\clickhouse.py", line 135, in create_table
field_statements.append(field_tmpl.format(name= ch_fields[i],
TypeError: 'map' object is not subscriptable
Решается исправлением в файле clickhouse.py в функции create_table строки:
ch_fields = map(get_ch_field_name, fields)
на строку:
ch_fields = list(map(get_ch_field_name, fields))
Здравствуйте.
Я пробую загрузить таблицы visits и hits с Yandex.Metrica Logs API в ClickHouse согласно доке https://nbviewer.jupyter.org/github/miptgirl/misc_code/blob/master/webinar_case.ipynb
Visits получилось выгрузить командой
python metrica_logs_api.py -mode history -source visits
а с Hits не получилось, при вводе команды python metrica_logs_api.py -mode history -source hits
в терминале выдает ошибку
2017-04-27 11:09:38 MainProcess INFO Starting new HTTPS connection (1): api-metrika.yandex.ru
2017-04-27 11:09:39 MainProcрess CRITICAL Iteration #1 failed
Traceback (most recent call last):
File "metrica_logs_api.py", line 127, in
integrate_with_logs_api(config, user_request)
File "metrica_logs_api.py", line 107, in integrate_with_logs_api
raise e
ValueError: <exception str() failed>
Подскажите пожалуйста, в чем ошибка?
('##### python', '2.7.14')
2018-02-06 12:21:40 MainProcess INFO CLI Options: Namespace(end_date='2017-08-10', mode=None, source='hits', start_date='2017-08-09')
2018-02-06 12:21:40 MainProcess INFO UserRequest(token=u'%removed%', counter_id=u'%removed%', start_date_str='2017-08-09', end_date_str='2017-08-10', source='hits', fields=(u'ym:pv:date', u'ym:pv:clientID', u'ym:pv:URL', u'ym:pv:regionCountry', u'ym:pv:deviceCategory'))
2018-02-06 12:21:40 MainProcess CRITICAL Iteration # 1 failed
Traceback (most recent call last):
File "yandex_logs_api_integration/metrica_logs_api.py", line 127, in
integrate_with_logs_api(config, user_request)
File "yandex_logs_api_integration/metrica_logs_api.py", line 107, in integrate_with_logs_api
raise e
ZeroDivisionError: division by zero
When I ask YM API to estimate the possibility of the query for the given time frame and the list of fields yandex returns the following: {'log_request_evaluation': {'max_possible_day_quantity': 0, 'possible': False}}
And apparently the library does not handle this reponse properly
Добрый день,
Столкнулись с проблемой: при загрузке КХ не может разобрать входные данные.
DB::Exception: Cannot parse input: expected \n before: \tym:s:date\tym:s:dateTime\tym:s:goalsID\tym:s:isNewUser\tym:s:lastAdvEngine\tym:s:lastClickBannerGroupName\tym:s:lastDirectClickOrder\tym:s:lastDirectClickOrderName\tym, e.what() = DB::Exception
Полный лог (изменен токен и урл сайта; counterId -- настоящий):
C:\Users\artem.gusev\Desktop\logs_api_integration-master>python metrica_logs_api.py -source visits -start_date 2017-07-21 -end_date 2017-07-25
2017-07-31 20:24:32 MainProcess INFO CLI Options: Namespace(end_date='2017-07-25', mode=None, source='visits', start_date='2017-07-21')
2017-07-31 20:24:32 MainProcess INFO UserRequest(token=u'xxx', counter_id=u'11881159', start_date_str='2017-07-21', end_date_str='2017-07-25', source='visits', fields=(u'ym:s:counterID', u'ym:s:dateTime', u'ym:s:date', u'ym:s:visitDuration', u'ym:s:bounce', u'ym:s:pageViews', u'ym:s:goalsID', u'ym:s:clientID', u'ym:s:lastTrafficSource', u'ym:s:lastAdvEngine', u'ym:s:lastSearchEngineRoot', u'ym:s:visitID', u'ym:s:startURL', u'ym:s:browser', u'ym:s:isNewUser', u'ym:s:lastReferalSource', u'ym:s:referer', u'ym:s:lastDirectClickOrder', u'ym:s:UTMCampaign', u'ym:s:UTMContent', u'ym:s:UTMMedium', u'ym:s:UTMSource', u'ym:s:UTMTerm', u'ym:s:regionCity', u'ym:s:lastDirectClickOrderName', u'ym:s:lastClickBannerGroupName'))
2017-07-31 20:24:33 MainProcess INFO ### CREATING TASK
{
"date1_str": "2017-07-21",
"date2_str": "2017-07-25",
"request_id": 189036,
"status": "created",
"user_request": [
"xxx",
"11881159",
"2017-07-21",
"2017-07-25",
"visits",
[
"ym:s:counterID",
"ym:s:dateTime",
"ym:s:date",
"ym:s:visitDuration",
"ym:s:bounce",
"ym:s:pageViews",
"ym:s:goalsID",
"ym:s:clientID",
"ym:s:lastTrafficSource",
"ym:s:lastAdvEngine",
"ym:s:lastSearchEngineRoot",
"ym:s:visitID",
"ym:s:startURL",
"ym:s:browser",
"ym:s:isNewUser",
"ym:s:lastReferalSource",
"ym:s:referer",
"ym:s:lastDirectClickOrder",
"ym:s:UTMCampaign",
"ym:s:UTMContent",
"ym:s:UTMMedium",
"ym:s:UTMSource",
"ym:s:UTMTerm",
"ym:s:regionCity",
"ym:s:lastDirectClickOrderName",
"ym:s:lastClickBannerGroupName"
]
]
}
2017-07-31 20:24:33 MainProcess INFO ### DELAY 20 secs
2017-07-31 20:24:53 MainProcess INFO ### CHECKING STATUS
2017-07-31 20:24:53 MainProcess INFO API Request status: created
2017-07-31 20:24:53 MainProcess INFO ### DELAY 20 secs
2017-07-31 20:25:13 MainProcess INFO ### CHECKING STATUS
2017-07-31 20:25:14 MainProcess INFO API Request status: created
2017-07-31 20:25:14 MainProcess INFO ### DELAY 20 secs
2017-07-31 20:25:34 MainProcess INFO ### CHECKING STATUS
2017-07-31 20:25:34 MainProcess INFO API Request status: processed
2017-07-31 20:25:34 MainProcess INFO ### SAVING DATA
2017-07-31 20:25:34 MainProcess INFO Part #0
2017-07-31 20:25:43 MainProcess INFO ### DATA SAMPLE
2017-07-31 20:25:43 MainProcess INFO ym:s:bounce ym:s:browser ym:s:clientID ym:s:counterID ym:s:date
ym:s:dateTime ym:s:goalsID ym:s:isNewUser ym:s:lastAdvEngine ym:s:lastClickBannerGroupName ym:s:lastDirectClickOrder ym:s:lastDirectClickOrderName ym:s:lastReferalSource ym:s:lastSearchEngineRoot ym:s:lastTrafficSource ym:s:pageViews ym:s:referer ym:s:regionCity ym:s:startURL ym:s:UTMCampaign ym:s:UTMContent ym:s:UTMMedium ym:s:UTMSource ym:s:UTMTerm ym:s:visitDuration ym:s:visitID
0 chromemobile 1500805991669493906 11881159 2017-07-23 2017-07-23 14:27:39 [2653102,4081657,6236337,4081657,6236337,4081657,6236337,4081657,6236337,4091284,6236337,6236751,6236754,4091284,6236337,6236751,6236754,4081657,6236337] 1 unknown 0 ad 9 http://zzz.ru/?admitad_uid=6613ac1c9fa84ebbf077f558d22b2164&advcake=1 admitad 278512 cpa advcake 198
4178276555863625609
0 yandex_browser 1498716776604298382 11881159 2017-07-21 2017-07-21 10:05:24 [4024870,6236871,2653102,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4091284,4256032,6236337,6236751,6236754,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871]
0 ya_undefined 0 yandex.ru yandex organic 24 http://yandex.ru/clck/jsredir?from=yandex.ru;search%2F;web;;&text=&etext=1488&&l10n=ru Vidnoe http://zzz.ru/tury
829 4128853169167929226
0 yandex_browser 149737450090090447 11881159 2017-07-22 2017-07-22 00:28:35 [4091284,6236751,6236754] 0 ya_undefined 0 direct 1 Severodvinsk
http://zzz.ru/turkey/resorts/alanya/hotels/tac-premier-hotel-spa-4.html#?fromCity=2&dateFrom=07.09.2017&dateTo=07.09.2017&nightFrom=10&nightTo=12&priceFrom=6000&priceTo=1000000&adults=2&kids=1&meal=all&activeTab=tours
16 4142429772089134798
0 safari_mobile 1500634811224441019 11881159 2017-07-21 2017-07-21 14:00:10 [4091284,6236751,6236754,28111274] 1 ya_undefined 0 yandex.ru yandex organic 1 https://yandex.ru/ Moscow http://m.zzz.ru/hungary/resorts/budapest#?fromCity=2&toCountry=20&toCity=358&dateFrom=29.07.2017&dateTo=29.07.2017&nightFrom=7&nightTo=8&adults=2&hotelClass=all&meal=all&priceFrom=6000&priceTo=1000000&sort=recommend 26 4132545904312059738
2017-07-31 20:25:44 MainProcess WARNING 1 rows were filtered out
2017-07-31 20:25:48 MainProcess CRITICAL Iteration #1 failed
Traceback (most recent call last):
File "metrica_logs_api.py", line 127, in <module>
integrate_with_logs_api(config, user_request)
File "metrica_logs_api.py", line 107, in integrate_with_logs_api
raise e
ValueError: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \n before: \tym:s:date\tym:s:dateTime\tym:s:goalsID\tym:s:isNewUser\tym:s:lastAdvEngine\tym:s:lastClickBannerGroupName\tym:s:lastDirectClickOrder\tym:s:lastDirectClickOrderName\tym, e.what() = DB::Exception
Подскажите, пожалуйста, что делать в такой ситуации?
Здравствуйте!
В файле конфигурации надо указать "host". Где его найти в консоли Яндекс.Облака? И как узнать требуемый порт?
Удалось поймать кейс с ошибкой парсинга.
Принимаем из Logs API поле ImpressionsProductName.
Это поле имеет в кликхаусе формат Array(String) и должно передаваться как ['Имя1', 'Имя2', 'Имя3']
Но когда на сайте есть товары, имеющие в названии знак дюйма, в качестве которого довольно часто пишут двойную кавычку, то в кликхаус поле ImpressionsProductName передается так:
"['Имя 1/2""', 'Имя2', 'Имя3']"
Т.е. данные заключаются в кавычки и поле превращается в строку, что и вызывает ошибку.
Промежуточные дампы показали что так это поле передается из Logs API.
Получаю max_possible_day_quantity меньше 0.
python3 metrica_logs_api.py -source hits -start_date 2019-08-01 -end_date 2019-08-19
##### python 3.7.3
2019-08-20 12:38:59 MainProcess INFO CLI Options: Namespace(end_date='2019-08-19', mode=None, source='hits', start_date='2019-08-01')
2019-08-20 12:38:59 MainProcess INFO UserRequest(token='%removed%', counter_id='%removed%', start_date_str='2019-08-01', end_date_str='2019-08-19', source='hits', fields=('ym:pv:watchID',))
2019-08-20 12:38:59 MainProcess INFO get_estimation https://api-metrika.yandex.ru/management/v1/counter/%removed%/logrequests/evaluate?date1=2019-08-01&date2=2019-08-19&source=hits&fields=ym%3Apv%3AwatchID
2019-08-20 12:39:01 MainProcess INFO get_estimation {"log_request_evaluation":{"possible":false,"max_possible_day_quantity":-6}}
2019-08-20 12:39:01 MainProcess INFO days = 18
2019-08-20 12:39:01 MainProcess INFO max_possible_day_quantity = -6
2019-08-20 12:39:01 MainProcess INFO num_requests = -2
2019-08-20 12:39:01 MainProcess INFO days_in_period = -8
2019-08-20 12:39:01 MainProcess INFO ### TOTAL TIME: 0 minutes 1 seconds
Это нормальное поведение API?
Поле Date не является обязательным в конфигурации полей.
При этом, в методе is_data_present совершается запрос к БД, который подразумевает наличие такого поля, и выводит exception при его отсутствии. Таким образом, 2ой и последующие запросы без дат ломаются.
Сценарий использования:
Соответственно,
Если уточните, какой вариант предпочтителен, готов подготовить код решения.
Добрый день! Некоторое время назад перестала работать выгрузка данных как visits, так и hits. До этого все успешно работало. Таблицы удаляла.
Мой config.json:
{
"token" : "",
"counter_id": "",
"visits_fields": [
"ym:s:counterID",
"ym:s:dateTime",
"ym:s:date",
"ym:s:clientID"
],
"hits_fields": [
"ym:pv:dateTime",
"ym:pv:date",
"ym:pv:clientID",
"ym:pv:URL", "ym:pv:regionCity", "ym:pv:params"
],
"log_level": "INFO",
"retries": 1,
"retries_delay": 60,
"clickhouse": {
"host": "http://localhost:8123",
"user": "",
"password": "",
"visits_table": "visits_all",
"hits_table": "hits_all",
"database": "default"
}
}
Лог:
2018-07-09 10:50:18 MainProcess INFO CLI Options: Namespace(end_date='2018-07-04', mode=None, source='visits', start_date='2018-06-22')
2018-07-09 10:50:18 MainProcess INFO UserRequest(token='', counter_id='', start_date_str='2018-06-22', end_date_str='2018-07-04', source='visits', fields=('ym:s:counterID', 'ym:s:dateTime', 'ym:s:date', 'ym:s:clientID'))
2018-07-09 10:50:18 MainProcess INFO ### CREATING TASK
{
"date1_str": "2018-06-22",
"date2_str": "2018-07-04",
"request_id": 970132,
"status": "created",
"user_request": [
"",
"",
"2018-06-22",
"2018-07-04",
"visits",
[
"ym:s:counterID",
"ym:s:dateTime",
"ym:s:date",
"ym:s:clientID"
]
]
}
2018-07-09 10:50:19 MainProcess INFO ### DELAY 20 secs
2018-07-09 10:50:39 MainProcess INFO ### CHECKING STATUS
2018-07-09 10:50:40 MainProcess INFO API Request status: created
2018-07-09 10:50:40 MainProcess INFO ### DELAY 20 secs
2018-07-09 10:51:00 MainProcess INFO ### CHECKING STATUS
2018-07-09 10:51:00 MainProcess INFO API Request status: created
2018-07-09 10:51:00 MainProcess INFO ### DELAY 20 secs
2018-07-09 10:51:20 MainProcess INFO ### CHECKING STATUS
2018-07-09 10:51:21 MainProcess INFO API Request status: created
2018-07-09 10:51:21 MainProcess INFO ### DELAY 20 secs
2018-07-09 10:51:41 MainProcess INFO ### CHECKING STATUS
2018-07-09 10:51:42 MainProcess INFO API Request status: created
2018-07-09 10:51:42 MainProcess INFO ### DELAY 20 secs
2018-07-09 10:52:02 MainProcess INFO ### CHECKING STATUS
2018-07-09 10:52:02 MainProcess INFO API Request status: created
2018-07-09 10:52:02 MainProcess INFO ### DELAY 20 secs
2018-07-09 10:52:22 MainProcess INFO ### CHECKING STATUS
2018-07-09 10:52:22 MainProcess INFO API Request status: processed
2018-07-09 10:52:22 MainProcess INFO ### SAVING DATA
2018-07-09 10:52:22 MainProcess INFO Part #0
2018-07-09 10:52:22 MainProcess INFO ### DATA SAMPLE
2018-07-09 10:52:22 MainProcess INFO ym:s:clientID ym:s:counterID ym:s:date ym:s:dateTime
15303468821043518302 47277879 2018-06-30 2018-06-30 12:41:37
1526622689952710585 47277879 2018-07-04 2018-07-04 09:48:06
1529832486417108325 47277879 2018-06-30 2018-06-30 11:58:54
0 47277879 2018-07-04 2018-07-04 04:36:52
2018-07-09 10:52:22 MainProcess WARNING 1 rows were filtered out
2018-07-09 10:52:22 MainProcess INFO Database created
2018-07-09 10:52:23 MainProcess INFO Table created
2018-07-09 10:52:23 MainProcess CRITICAL Iteration #1 failed
Traceback (most recent call last):
File "metrica_logs_api.py", line 127, in
integrate_with_logs_api(config, user_request)
File "metrica_logs_api.py", line 107, in integrate_with_logs_api
raise e
File "metrica_logs_api.py", line 100, in integrate_with_logs_api
logs_api.save_data(api_request, part)
File "/usr/yam/logs_api.py", line 174, in save_data
output_data)
File "/usr/yam/clickhouse.py", line 152, in save_data
create_table(source, fields)
File "/usr/yam/clickhouse.py", line 132, in create_table
field_statements.append(field_tmpl.format(name= ch_fields[i],
TypeError: 'map' object is not subscriptable
most of the times I try to use this it keeps returns
Starting new HTTPS connection (1): api-metrika.yandex.com 2019-07-03 16:12:20 MainProcess INFO API Request status: created 2019-07-03 16:12:20 MainProcess INFO ### DELAY 20 secs
even for 20 minutes and I have to force quit it. but some times it finishes in 5 minutes
Пытаюсь выбрать все данные по визитам за пару дней
python metrica_logs_api.py -source visits -start_date 2017-03-01 -end_date 2017-03-03
Конфиг выглядит так
{ "token" : "...", "counter_id": "...", "visits_fields": [ "ym:s:visitID", "ym:s:counterID", "ym:s:watchIDs", "ym:s:date", "ym:s:dateTime", "ym:s:dateTimeUTC", "ym:s:isNewUser", "ym:s:startURL", "ym:s:endURL", "ym:s:pageViews", "ym:s:visitDuration", "ym:s:bounce", "ym:s:ipAddress", "ym:s:params", "ym:s:goalsID", "ym:s:goalsSerialNumber", "ym:s:goalsDateTime", "ym:s:goalsPrice", "ym:s:goalsOrder", "ym:s:goalsCurrency", "ym:s:clientID", "ym:s:lastTrafficSource", "ym:s:lastAdvEngine", "ym:s:lastReferalSource", "ym:s:lastSearchEngineRoot", "ym:s:lastSearchEngine", "ym:s:lastSocialNetwork", "ym:s:lastSocialNetworkProfile", "ym:s:referer", "ym:s:lastDirectClickOrder", "ym:s:lastDirectBannerGroup", "ym:s:lastDirectClickBanner", "ym:s:lastDirectPhraseOrCond", "ym:s:lastDirectPlatformType", "ym:s:lastDirectPlatform", "ym:s:lastDirectConditionType", "ym:s:lastCurrencyID", "ym:s:from", "ym:s:UTMCampaign", "ym:s:UTMContent", "ym:s:UTMMedium", "ym:s:UTMSource", "ym:s:UTMTerm", "ym:s:openstatAd", "ym:s:openstatCampaign", "ym:s:openstatService", "ym:s:openstatSource", "ym:s:hasGCLID", "ym:s:regionCountry", "ym:s:regionCity", "ym:s:browserLanguage", "ym:s:browserCountry", "ym:s:clientTimeZone", "ym:s:deviceCategory", "ym:s:mobilePhone", "ym:s:mobilePhoneModel", "ym:s:operatingSystemRoot", "ym:s:operatingSystem", "ym:s:browser", "ym:s:browserMajorVersion", "ym:s:browserMinorVersion", "ym:s:browserEngine", "ym:s:browserEngineVersion1", "ym:s:browserEngineVersion2", "ym:s:browserEngineVersion3", "ym:s:browserEngineVersion4", "ym:s:cookieEnabled", "ym:s:javascriptEnabled", "ym:s:flashMajor", "ym:s:flashMinor", "ym:s:screenFormat", "ym:s:screenColors", "ym:s:screenOrientation", "ym:s:screenWidth", "ym:s:screenHeight", "ym:s:physicalScreenWidth", "ym:s:physicalScreenHeight", "ym:s:windowClientWidth", "ym:s:windowClientHeight", "ym:s:purchaseID", "ym:s:purchaseDateTime", "ym:s:purchaseAffiliation", "ym:s:purchaseRevenue", "ym:s:purchaseTax", "ym:s:purchaseShipping", "ym:s:purchaseCoupon", "ym:s:purchaseCurrency", "ym:s:purchaseProductQuantity", "ym:s:productsPurchaseID", "ym:s:productsID", "ym:s:productsName", "ym:s:productsBrand", "ym:s:productsCategory", "ym:s:productsCategory1", "ym:s:productsCategory2", "ym:s:productsCategory3", "ym:s:productsCategory4", "ym:s:productsCategory5", "ym:s:productsVariant", "ym:s:productsPosition", "ym:s:productsPrice", "ym:s:productsCurrency", "ym:s:productsCoupon", "ym:s:productsQuantity", "ym:s:impressionsURL", "ym:s:impressionsDateTime", "ym:s:impressionsProductID", "ym:s:impressionsProductName", "ym:s:impressionsProductBrand", "ym:s:impressionsProductCategory", "ym:s:impressionsProductCategory1", "ym:s:impressionsProductCategory2", "ym:s:impressionsProductCategory3", "ym:s:impressionsProductCategory4", "ym:s:impressionsProductCategory5", "ym:s:impressionsProductVariant", "ym:s:impressionsProductPrice", "ym:s:impressionsProductCurrency", "ym:s:impressionsProductCoupon", "ym:s:lastDirectClickOrderName", "ym:s:lastClickBannerGroupName", "ym:s:lastDirectClickBannerName", "ym:s:networkType" ], "hits_fields": [ "ym:pv:counterID", "ym:pv:dateTime", "ym:pv:date", "ym:pv:firstPartyCookie" ], "log_level": "INFO", "retries": 1, "retries_delay": 60, "clickhouse": { "host": "http://localhost:8123", "user": "...", "password": "...", "visits_table": "visits_all", "hits_table": "hits_all", "database": "..." } }
Полученные данные и ошибки
`2017-03-03 14:42:03 MainProcess INFO ### DATA SAMPLE
2017-03-03 14:42:03 MainProcess INFO ym:s:bounce ym:s:browser ym:s:browserCountry ym:s:browserEngine ym:s:browserEngineVersion1 ym:s:browserEngineVersion2 ym:s:browserEngineVersion3 ym:s:browserEngineVersion4 ym:s:browserLanguage ym:s:browserMajorVersion ym:s:browserMinorVersion ym:s:clientID ym:s:clientTimeZone ym:s:cookieEnabled ym:s:counterID ym:s:date ym:s:dateTime ym:s:dateTimeUTC ym:s:deviceCategory ym:s:endURL ym:s:flashMajor ym:s:flashMinor ym:s:from ym:s:goalsCurrency ym:s:goalsDateTime ym:s:goalsID ym:s:goalsOrder ym:s:goalsPrice ym:s:goalsSerialNumber ym:s:hasGCLID ym:s:impressionsDateTime ym:s:impressionsProductBrand ym:s:impressionsProductCategory ym:s:impressionsProductCategory1 ym:s:impressionsProductCategory2 ym:s:impressionsProductCategory3 ym:s:impressionsProductCategory4 ym:s:impressionsProductCategory5 ym:s:impressionsProductCoupon ym:s:impressionsProductCurrency ym:s:impressionsProductID ym:s:impressionsProductName ym:s:impressionsProductPrice ym:s:impressionsProductVariant ym:s:impressionsURL ym:s:ipAddress ym:s:isNewUser ym:s:javascriptEnabled ym:s:lastAdvEngine ym:s:lastClickBannerGroupName ym:s:lastCurrencyID ym:s:lastDirectBannerGroup ym:s:lastDirectClickBanner ym:s:lastDirectClickBannerName ym:s:lastDirectClickOrder ym:s:lastDirectClickOrderName ym:s:lastDirectConditionType ym:s:lastDirectPhraseOrCond ym:s:lastDirectPlatform ym:s:lastDirectPlatformType ym:s:lastReferalSource ym:s:lastSearchEngine ym:s:lastSearchEngineRoot ym:s:lastSocialNetwork ym:s:lastSocialNetworkProfile ym:s:lastTrafficSource ym:s:mobilePhone ym:s:mobilePhoneModel ym:s:networkType ym:s:openstatAd ym:s:openstatCampaign ym:s:openstatService ym:s:openstatSource ym:s:operatingSystem ym:s:operatingSystemRoot ym:s:pageViews ym:s:params ym:s:physicalScreenHeight ym:s:physicalScreenWidth ym:s:productsBrand ym:s:productsCategory ym:s:productsCategory1 ym:s:productsCategory2 ym:s:productsCategory3 ym:s:productsCategory4 ym:s:productsCategory5 ym:s:productsCoupon ym:s:productsCurrency ym:s:productsID ym:s:productsName ym:s:productsPosition ym:s:productsPrice ym:s:productsPurchaseID ym:s:productsQuantity ym:s:productsVariant ym:s:purchaseAffiliation ym:s:purchaseCoupon ym:s:purchaseCurrency ym:s:purchaseDateTime ym:s:purchaseID ym:s:purchaseProductQuantity ym:s:purchaseRevenue ym:s:purchaseShipping ym:s:purchaseTax ym:s:referer ym:s:regionCity ym:s:regionCountry ym:s:screenColors ym:s:screenFormat ym:s:screenHeight ym:s:screenOrientation ym:s:screenWidth ym:s:startURL ym:s:UTMCampaign ym:s:UTMContent ym:s:UTMMedium ym:s:UTMSource ym:s:UTMTermym:s:visitDuration ym:s:visitID ym:s:watchIDs ym:s:windowClientHeight ym:s:windowClientWidth
0 chrome 0 WebKit 537 36 0 0 30066 56 0 1488484141656852122 180 1 421224 2017-03-02 2017-03-02 22:48:55 2017-03-02 22:48:55 1 https://... 0 0 [0] ['2017-03-02 22:49:58'] [14196793]...
....
2017-03-03 14:42:03 MainProcess CRITICAL Iteration #1 failed
Traceback (most recent call last):
File "metrica_logs_api.py", line 127, in
integrate_with_logs_api(config, user_request)
File "metrica_logs_api.py", line 107, in integrate_with_logs_api
raise e
ValueError: Code: 26, e.displayText() = DB::Exception: Cannot parse quoted string: expected opening quote: (at row 1)
Row 1:
Column 0, name: Bounce, type: UInt8, parsed text: "0"
Column 1, name: Browser, type: String, parsed text: "chrome"
Column 2, name: BrowserCountry, type: String, parsed text: "0"
Column 3, name: BrowserEngine, type: String, parsed text: "WebKit"
Column 4, name: BrowserEngineVersion1, type: UInt16, parsed text: "537"
Column 5, name: BrowserEngineVersion2, type: UInt16, parsed text: "36"
Column 6, name: BrowserEngineVersion3, type: UInt16, parsed text: "0"
Column 7, name: BrowserEngineVersion4, type: UInt16, parsed text: "0"
Column 8, name: BrowserLanguage, type: String, parsed text: "30066"
Column 9, name: BrowserMajorVersion, type: UInt16, parsed text: "56"
Column 10, name: BrowserMinorVersion, type: UInt16, parsed text: "0"
Column 11, name: ClientID, type: UInt64, parsed text: "1488484141656852122"
Column 12, name: ClientTimeZone, type: Int16, parsed text: "180"
Column 13, name: CookieEnabled, type: UInt8, parsed text: "1"
Column 14, name: CounterID, type: UInt32, parsed text: "421224"
Column 15, name: Date, type: Date, parsed text: "2017-03-02"
Column 16, name: DateTime, type: DateTime, parsed text: "2017-03-02 22:48:55"
Column 17, name: DateTimeUTC, type: DateTime, parsed text: "2017-03-02 22:48:55"
Column 18, name: DeviceCategory, type: String, parsed text: "1"
Column 19, name: EndURL, type: String, parsed text: "https://..."
Column 20, name: FlashMajor, type: UInt8, parsed text: "0"
Column 21, name: FlashMinor, type: UInt8, parsed text: "0"
Column 22, name: From, type: String, parsed text:
Column 23, name: GoalsCurrency, type: Array(String), parsed text: "["ERROR
, e.what() = DB::Exception
`
Суть - в получаемых данных LastDirectClickOrder может содержать либо число, либо пустоту, т.е. не может быть числом в базе данных.
OfflineCallFirstTimeCaller - Array(UInt32)
Нет. Пример значения: "[-1,-1,-1,-1,-1,-1,-1,-1]"
Правильней: "ym:s:offlineCallFirstTimeCaller": "Array(Int8)",
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.