practical-data-science / ecommercetools Goto Github PK

EcommerceTools is a Python data science toolkit for ecommerce, marketing science, and technical SEO analysis and modelling and was created by Matt Clarke.

License: MIT License

Python 60.42% Jupyter Notebook 39.58%

customer customers ecommerce marketing marketing-analytics marketing-tools retail seo seo-optimization seotools

ecommercetools's People

Contributors

Stargazers

Watchers

Forkers

admariner mindaugasvaitkus2 seozoidberg altovate xlcdp archivesum kekewind amirunpri2018 andypineda spiritofpigeon maldabba traderpedroso dmetrosoft andysungtw peccer ruber0id davorjordacevic peterza2019 edson-github jagarinart ska-ibees p-r-t capens l-d96 eyalbre burakozturkdot marcin-pawlowski jwcastillo davidringram blacktscoder younis-ahmed pantdeepakk5 fullbright ammar257ammar bexultanainabekov azimuthadv samk208 agroff11 codehornets techxbase etiusane-upright m3sca rajivsingha lucaso21 derkodex vbsoftpl smbiz1

ecommercetools's Issues

Feature request - Add support for other languages when calling google_autocomplete()

The google_autocomplete() function currently supports en only because the expanded suggestions data are in English (and I don't reliably speak other languages). Extend support for other languages...

Here are some prefixes. I send you two lists: one with accents, and one without. In Spanish people tend to search without accents on desktop and with accents on mobile (because of the auto-dictionary) so feel free to choose.

("quien es *", "qué es *","dónde", "dónde esta *", "cuándo *", "por qué *", "cómo *", "mejor", "barato", "peor", "es", "qué", "cuando", "quién","cuál es","cuánto",)

("quien es *", "que es *","donde", "donde esta *", "cuando *", "por que", "como *", "mejor", "barato", "peor", "es *", "que", "cuando", "quien","cual es","cuanto",)

Please do let me know if I can be of further help :)

Best,
Juan

sku counter in get_customers not counting unique skj

Hello, amazing tool!

I tried to use it for my business but found that the columns skus (which I understood to be the unique sku purchased for each customer) it does not aggregate as such.

example using .load_sample_data() for customer 12347 has 103 different skus, but tool columns skus reports 7

Error: Too many requests. Google has temporarily blocked you. Try again later.

I have got this issue when I test the app in my browser:
Error: Too many requests. Google has temporarily blocked you. Try again later.
It must be because of Google server. But I want to ask how can I use it legally or somehow purchase this for commercial purpose?
Thanks!

Customer segmentation

Is there a way to review silhouette score?

seo.query_google_search_console cant read JSON Client Secret file

I already generate JSON Client Secret from Google API then I run Google Search Console command

df = seo.query_google_search_console(key, site_url, payload)

Then Terminal respond

Error: Service account info was not in the expected format, missing fields token_uri, client_email.

_get_xml() can fail when a server requires a user agent

Change seo.py...

import pandas as pd
import urllib.request
from urllib.request import Request
from urllib.parse import urlparse
from bs4 import BeautifulSoup


def _get_xml(url: str):
    """Scrapes an XML sitemap from the provided URL and returns XML source.
    Args:
        url (string): Fully qualified URL pointing to XML sitemap.
    Returns:
        xml (string): XML source of scraped sitemap.
    """

    try:
        response = urllib.request.urlopen(Request(url, headers={'User-Agent': 'Mozilla'}))
        xml = BeautifulSoup(response,
                            'lxml-xml',
                            from_encoding=response.info().get_param('charset'))
        return xml
    except Exception as e:
        print("Error: ", e)

Notice of archival

Matt Clarke, the creator of this project and owner of this repo, has unfortunately passed away.

No new work will be happening in this repo, and it should be considered archived and read-only.

Forks are welcomed and encouraged to keep this project alive.

Thank you for your understanding.

seo.get_indexed_pages cant running

When i run this command :
from ecommercetools import seo

urls = ['https://www.bbc.co.uk']
df = seo.get_indexed_pages(urls)
print(df.head())

it Response :

Traceback (most recent call last):
File "....../get-index-value.py", line 7, in
df = seo.get_indexed_pages(urls)
File "....../lib/python3.8/site-packages/ecommercetools/seo/google_search.py", line 88, in get_indexed_pages
site_data = {'url': site, 'indexed_pages': _count_indexed_pages(site)}
File "....../lib/python3.8/site-packages/ecommercetools/seo/google_search.py", line 73, in _count_indexed_pages
return _parse_site_results(response)
File "....../lib/python3.8/site-packages/ecommercetools/seo/google_search.py", line 58, in _parse_site_results
indexed = int(string.split(' ')[1].replace(',', ''))
ValueError: invalid literal for int() with base 10: '43.500.000'

Need your help. Thanks.

response from _get_results(query) contains NoneType which leads to parsing Fail

Hi Matt,

trying to scrape from google, I followed your blogpost on 3 lines google scraping and got the following error:

AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 results = seo.get_serps("stupid")
      2 print(results)
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:144, in get_serps(query, output)
    133 """Return the first 10 Google search results for a given query.
    134 
    135 Args:
   (...)
    140     results (dict): Results of query.
    141 """
    143 response = _get_results(query)
--> 144 results = _parse_search_results(response)
    146 if results:
    147     if output == "dataframe":

File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:124, in _parse_search_results(response)
    118 output = []
    120 for result in results:
    121     item = {
    122         'title': result.find(css_identifier_title, first=True).text,
    123         'link': result.find(css_identifier_link, first=True).attrs['href'],
--> 124         'text': result.find(css_identifier_text, first=True).text
...
    125     }
    127     output.append(item)
    129 return output

AttributeError: 'NoneType' object has no attribute 'text'

then i tried your other blogpost scrape with python, which is not relying on the ecommercetools package, and followed it to the T.
here is the interesting part:

results = google_search("stupid")
results

yields normal output, rerunning this (jupyter cell) with keyword

results = google_search("allergy")
results

yields

AttributeError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 results = google_search("allergy")
      2 results

Cell In[8], line 3, in google_search(query)
      1 def google_search(query):
      2     response = get_results(query)
----> 3     return parse_results(response)

Cell In[7], line 17, in parse_results(response)
     10 output = []
     12 for result in results:
     14     item = {
     15         'title': result.find(css_identifier_title, first=True).text,
     16         'link': result.find(css_identifier_link, first=True).attrs['href'],
---> 17         'text': result.find(css_identifier_text, first=True).text
     18     }
     20     output.append(item)
     22 return output

AttributeError: 'NoneType' object has no attribute 'text'

So sometimes, the result.find(css_identifier_text, first=True): yields True , but NoneType ??
I have no Idea, under which circumstances this NoneType arises, but the behavior is as follows:
the seo.get_serps() from ecommercetools consistently throws the error, the "hand written" equivalent is keyword sensitive, e.g. "allergy" throws the error, "keyword sensitive" does not.

Scikit dependency deprecated in pip

There's an error on dependencies when it tries to install scikit package. This should be updated to scikit-learn.

` × Building wheel for sklearn (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
rather than 'sklearn' for pip commands.

  Here is how to fix this error in the main use cases:
  - use 'pip install scikit-learn' rather than 'pip install sklearn'
  - replace 'sklearn' by 'scikit-learn' in your pip requirements files
    (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
  - if the 'sklearn' package is used by one of your dependencies,
    it would be great if you take some time to track which package uses
    'sklearn' instead of 'scikit-learn' and report it to their issue tracker
  - as a last resort, set the environment variable
    SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error
  
  More information is available at
  https://github.com/scikit-learn/sklearn-pypi-package
  
  If the previous advice does not cover your use case, feel free to report it at
  https://github.com/scikit-learn/sklearn-pypi-package/issues/new
  [end of output]`

practical-data-science / ecommercetools Goto Github PK

ecommercetools's People

Contributors

Stargazers

Watchers

Forkers

ecommercetools's Issues

Feature request - Add support for other languages when calling google_autocomplete()

sku counter in get_customers not counting unique skj

Error: Too many requests. Google has temporarily blocked you. Try again later.

Customer segmentation

seo.query_google_search_console cant read JSON Client Secret file

_get_xml() can fail when a server requires a user agent

Notice of archival

seo.get_indexed_pages cant running

response from _get_results(query) contains NoneType which leads to parsing Fail

Scikit dependency deprecated in pip

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent