Comments (9)
Was the image corrupted during download?
from concept.
That indeed does sound familiar. I can remember having trouble with some of the images when starting out developing the packages but I cannot find the code for fixing this. I believe you are indeed correct that some of the images are either corrupted or in a different format.
I would have to do some digging but it indeed seems that this might not be the best dataset to be used unless some images are removed...
from concept.
Not an elegant solution ... but it was quick:
# Prepare images
batch_size = 1
nr_iterations = int(np.ceil(len(images) / batch_size))
# Embed images per batch
embeddings = []
for i in tqdm(range(nr_iterations)):
start_index = i * batch_size
end_index = (i * batch_size) + batch_size
images_to_embed = [Image.open(filepath) for filepath in images[start_index:end_index]]
try:
img_emb = self.embedding_model.encode(images_to_embed, show_progress_bar=False)
embeddings.extend(img_emb.tolist())
# Close images
for image in images_to_embed:
image.close()
except Exception:
print("Skipping: %s", images[start_index:end_index])
# Close images
for image in images_to_embed:
image.close()
continue
return np.array(embeddings)
from concept.
Is this what you get?
from concept.
I just tried out the following code in a Kaggle session and I had no issues:
import os
import glob
import zipfile
from tqdm import tqdm
from sentence_transformers import util
from concept import ConceptModel
# 25k images from Unsplash
img_folder = 'photos/'
if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:
os.makedirs(img_folder, exist_ok=True)
photo_filename = 'unsplash-25k-photos.zip'
if not os.path.exists(photo_filename): # Download dataset if does not exist
util.http_get('http://sbert.net/datasets/' + photo_filename, photo_filename)
# Extract all images
with zipfile.ZipFile(photo_filename, 'r') as zf:
for member in tqdm(zf.infolist(), desc='Extracting'):
zf.extract(member, img_folder)
img_names = list(glob.glob('photos/*.jpg'))
# Train model
concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names)
It might mean that something indeed went wrong when trying to load the images in your environment.
Is this what you get?
Concept uses UMAP, which is stocastisch by natures, which means that every run you will get different results. So comparing outputs is not easily done unless you set a random_state in UMAP which might hurt performance.
from concept.
Its something in the inner workings of SentenceTransformer('clip-ViT-B-32').
Possibly an issue with tensorflow_macos/tensorflow_metal and torch.
Investigating ...
from concept.
It appears that the images which cause the exceptions int the normailze() methods have 4 channels (not 3).
numpy 1.20.3
transformers 4.11.3
sentence_transformers 2.1.0
Are you using the same versions?
these defaults in class CLIPFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionMixin): look odd.
do_normalize (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not to normalize the input with :obj:`image_mean` and :obj:`image_std`.
image_mean (:obj:`List[int]`, defaults to :obj:`[0.485, 0.456, 0.406]`):
The sequence of means for each channel, to be used when normalizing images.
image_std (:obj:`List[int]`, defaults to :obj:`[0.229, 0.224, 0.225]`):
The sequence of standard deviations for each channel, to be used when normalizing images.
...
self.do_normalize = do_normalize
self.image_mean = image_mean if image_mean is not None else [0.48145466, 0.4578275, 0.40821073]
self.image_std = image_std if image_std is not None else [0.26862954, 0.26130258, 0.27577711]
But why doesn't your system generate these exceptions?
from concept.
from concept.
But why doesn't your system generate these exceptions?
Indeed, this is quite strange as it is working perfectly fine for me in a Kaggle session with the following packages:
transformers==4.15.0
tokenizers==0.10.3
sentence-transformers==1.2.0
numpy==1.20.3
Hopefully, the issue you posted on the CLIP page gives a bit more insight.
from concept.
Related Issues (19)
- Index Error: index out of bounds error for visualize concepts HOT 7
- OSError: [Errno 24] Too many open files: 'photos/icnZ2R8PcDs.jpg' HOT 3
- Exemplar dict is not serializable HOT 3
- Pandas key error during model fitting HOT 9
- Multilingual support HOT 3
- TypeError: __init__() got an unexpected keyword argument 'cachedir' HOT 1
- How can we get probabilities for all clusters in transform function? HOT 3
- Saving the model HOT 2
- AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names' HOT 5
- discussion on different concepts results HOT 2
- sentence-transformers version HOT 2
- AttributeError: 'ConceptModel' object has no attribute 'image_cluster_df' HOT 3
- TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k. HOT 1
- TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k. HOT 1
- Questions HOT 4
- Question about the Function transform HOT 7
- Saving the model HOT 2
- Using GPU while processing concepts HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from concept.