ValueError: Found input variables with inconsistent numbers of samples

saoko · Jun-16-2022, 06:59 PM

I'm trying to write a bounding box regression training script with Keras and TensorFlow for object detection. I have a dataset of 3153 images (in jpg extension) and an txt file of bounding box annotations which consists 6430 lines (some pictures have multiple bounding box). This is a part of txt file (to know how it look):

2007_000027 101 174 351 349
2007_000032 180 195 229 213
2007_000032 189 26 238 44
2007_000129 1 74 462 272
2007_000129 19 252 487 334
2007_000170 91 3 206 43
2007_000170 28 4 372 461
2007_000272 71 25 500 304
2007_000323 3 277 375 500
2007_000323 3 12 375 305

I created a configuration file, which stores directories to some files:

BASE_PATH = "dataset"
IMAGES_PATH = os.path.sep.join([BASE_PATH, "images"])
ANNOTS_PATH = os.path.sep.join([BASE_PATH, "bboxes.txt"])

BASE_OUTPUT = "output"
MODEL_PATH = os.path.sep.join([BASE_OUTPUT, "detector.h5"])
PLOT_PATH = os.path.sep.join([BASE_OUTPUT, "plot.png"])
TEST_FILENAMES = os.path.sep.join([BASE_OUTPUT, "test_images.txt"])

INIT_LR = 1e-4
NUM_EPOCHS = 25
BATCH_SIZE = 32

The second file includes code to train my data:

print("INFO - loading dataset...")
rows = open(config.ANNOTS_PATH).read().strip().split("\n")
data = []
targets = []
filenames = []

for row in rows: 
    row = row.split(' ')
    (filename, startX, startY, endX, endY) = row
    suffix = ".jpg"
    imagePath = os.path.sep.join([config.IMAGES_PATH, filename+suffix])
    image = cv2.imread(imagePath)
    (h, w) = image.shape[:2]

    startX = float(startX) / w
    startY = float(startY) / h
    endX = float(endX) / w
    endY = float(endY) / h

    image = load_img(imagePath, target_size=(224, 224))
    image = img_to_array(image)

    data.append(image)
    targets.append((startX, startY, endX, endY))
    filenames.append

data = np.array(data, dtype="float32") / 255.0
targets = np.array(targets, dtype="float32")

split = train_test_split(data, targets, filenames, test_size=0.10, random_state=42)

(trainImages, testImages) = split[:2]
(trainTargets, testTargets) = split[2:4]
(trainFilenames, testFilenames) = split[4:]

print("INFO - saving testing filenames...")
f = open(config.TEST_FILENAMES, "w")
f.write("\n".join(testFilenames))
f.close()

vgg = VGG16(weights="imagenet", include_top=False, input_tensor=Input(shape=(224, 224, 3)))
vgg.trainable = False

flatten = vgg.output
flatten = Flatten()(flatten)

bboxHead = Dense(128, activation="relu")(flatten)
bboxHead = Dense(64, activation="relu")(bboxHead)
bboxHead = Dense(32, activation="relu")(bboxHead)
bboxHead = Dense(4, activation="sigmoid")(bboxHead)

model = Model(inputs=vgg.input, outputs=bboxHead)

opt = Adam(lr=config.INIT_LR)
model.compile(loss="mse", optimizer=opt)
print(model.summary())

print("INFO - training bounding box regressor...")
H = model.fit(
    trainImages, trainTargets, 
    validation_data=(testImages, testTargets), 
    batch_size=config.BATCH_SIZE, epochs=config.NUM_EPOCHS, verbose=1)

print("INFO - saving objects detector model...")
model.save(config.MODEL_PATH, save_format="h5")

N = config.NUM_EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.title("Bounding box regression loss on training set")
plt.xlabel("Epoch #")
plt.ylabel("Loss")
plt.legend(loc="lower left")
plt.savefig(config.PLOT_PATH)

When I run my code I get the next error:

Quote:Traceback (most recent call last): File "/Users/username/Downloads/od/train.py", line 47, in split = train_test_split(data, targets, filenames, test_size=0.10, random_state=42) File "/usr/local/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 2430, in train_test_split arrays = indexable(*arrays) File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 433, in indexable check_consistent_length(*result) File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 387, in check_consistent_length raise ValueError( ValueError: Found input variables with inconsistent numbers of samples: [6430, 6430, 0]

I understand that the number of lines is not equal to number of images, but I can't change data in txt file. Can someone help me to correct this code to train my data properly.

Thanks!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Using a For Loop to subtract numbers from an input function.	Anunderling	9	2,819	Sep-22-2025, 08:56 PM Last Post: deanhystad
	Identify salinity of water samples with images?	Rangerguy	1	1,182	Aug-24-2024, 11:18 PM Last Post: Larz60+
	ValueError: could not broadcast input array from shape	makingwithheld	1	5,656	Jul-06-2024, 03:02 PM Last Post: paul18fr
	Read csv file with inconsistent delimiter	gracenz	2	3,593	Mar-27-2023, 08:59 PM Last Post: deanhystad
	Inconsistent loop iteration behavior	JonWayn	2	2,490	Dec-10-2022, 06:49 AM Last Post: JonWayn
	ValueError: substring not found	nby2001	4	12,965	Aug-08-2022, 11:16 AM Last Post: rob101
	WHILE Loop - constant variables NOT working with user input boundaries	C0D3R	4	3,454	Apr-05-2022, 06:18 AM Last Post: C0D3R
	Loop Dict with inconsistent Keys	Personne	1	2,626	Feb-05-2022, 03:19 AM Last Post: Larz60+
	Inconsistent counting / timing with threading	rantwhy	1	3,022	Nov-24-2021, 04:04 AM Last Post: deanhystad
	Inconsistent behaviour in output - web scraping	Steve	6	5,338	Sep-20-2021, 01:54 AM Last Post: Larz60+

ValueError: Found input variables with inconsistent numbers of samples

User Panel Messages

Announcements