Computer vision narratives

From DigitalCraft_Wiki
Jump to navigation Jump to search

Tutoring

Forward/Introduction

Abstract

Central Question

What if a machine can impose actions to a human?

How can I insinuate how machines/camera's/humans unasked monitoring what you're doing?

Can the machine see more than humans do? (create a fake world)

Can I built a world where there is no trace of reality? (AR/VR)

Can I create a machine that is creating stories out of what he is seeing?




What if a machine (can recognize a person and) can give commands to a person?

Relevance of the Topic

Hypothesis

Research Approach

Key References

Netherlands

The police is testing with camera's the highways to check if people obey the law. By using object recognition they can recognise the car, liscense-plate and windshield. The camera images from drivers who obey the law, will be destroyed immediately. For dutch principles that's good, but if you want to improve the object-recognition software, you also need images from drivers who doing it right so you get two groups. It is about the decision between technical improvement of privacy of citizens.

Nos-handsfree-jw.png


China

In China they do it way smarter than in the Netherlands. They keep all the infomation they get to improve their systems. Technical improvement is a way higher priority than privacy, so in China's case the decision is already made.

China-obj-recogntion.jpg


Sleepwet

Literature

4-D internet. Het internet wat we nu kennen is natuurlijk online aanwezig maar ook offline. Bijvoorbeeld iets uploaden wat tegen de regels is, waardoor er de volgende dag iemand op de stoep staat. Eten bestellen waardoor er binnen 20 minuten een mens op een fiets voor je deur staat met je eten. Deze manier van leven is het internet niet te ontwijken. Overal is de confrontatie aanwezig. Het internet is niet meer alleen een raam naar de online wereld, maar de hele omgeving is het internet geworden. Dit is hoe het tegenwoordig in elkaar zit, naast dat kinderen in groep 3/4 les krijgen in hun eigen moeder taal wordt het ook noodzakelijk om de taal van de computer te spreken. Lessen in coderen. Ik ben van mening dat dit voor de volgende generatie mensen echt belangrijk is omdat alleen dan deze nieuwe generatie niet onwetend wordt van wat voor kracht de machines in ons dagelijks leven hebben. Als de nieuwe generatie niet door heeft hoe deze machines te werk gaan lijdt dit tot een on


Narrative intelligence. De mens geeft betekenis aan de wereld door middel van verhalen. Mensen zijn nieuwsgierig naar hoe andere mensen hebben geleefd, en vertellen dergelijke verhalen aan elkaar door.(wellicht met hun eigen interpretatie er bij zodat het verhaal steeds een beetje vervormt.) Om machines betekenis te laten geven aan de wereld zijn er ook veel mensen bezig om machines verhalen te laten lezen, leren te begrijpen en vervolgens zelf verhalen te laten schrijven. Narratief kan er voor zorgen om mensen ergens mentaal mee naar toe te nemen en voor iedereen is dit verhaal in hun hoofd anders. Iedereen geeft een andere kleur, vorm of patroon aan het verhaal, en zo genereert dat voor dat specifieke persoon een eigen werkelijkheid. Narratief kan ook een leiding hebben over iemands leven, denk aan religie waar elk verhaal betekenis heeft hoe men zich zou moeten gedragen. Machines hebben daar geen benul van wat een verhaal kan doen. Eerder haalde ik al the Stanley Parable aan, maar dit geeft in weze de essentie van lezer en verteller. Die grens vervaagt in dat spel, je bent eigenlijk in gesprek met elkaar, niet verbaal maar mentaal. Hierdoor ontstaat er een zeer groot web aan verhaal lijnen die oneindig lijken en door elkaar heen vloeien.

Object detection. The problem is not just about solving the 'what?', it's also about solving the 'where?'

The difference between traditional and technical images, then, would be this: the first are observations of objects, the second computations of concepts. The first arise through depiction, the second through a peculiar hallucinatory power that has lost its faith in rules. This essay will discuss that hallucinatory power. - V. Flusser

The Traditional image - observation of objects/concepts - depiction of image dependent of rules

The Technical image - computations of concepts - image with hallucinatory power, independent of rules

Object Recognition

Met object recognition word gebruik gemaakt van een dataset waar de machine op getraind is. Deze dataset bestaat uit plaatjes gesorteerd op categorie en zijn allemaal gelabeld in de juiste class. Je kan zelf kiezen hoe vaak en hoe lang je hem laat trainen. Hoe langer, hoe beter hij zal herkennen. Zodra hij getraind is kun je de software pas echt gebruiken. Je kan dan foto’s, video’s of live video inladen om te laten herkennen. De software is zeer snel en accuraat. Maar uiteindelijk gaat het allemaal om hoe hij getraind is en met wat voor dataset.

Het idee dat de computer letterlijk kan waarnemen zonder dat er een mens naar het scherm kijkt is bijzonder. In bepaalde contexten kan dit heel handig zijn, zoals met bewakings camera’s, zoektochten naar bepaalde cellen in het lichaam (afbeeldingen van ziektecellen inladen en in het lichaam zoeken naar zulke cellen) maar wat het zo interessant maakt is dat de output gebaseerd is op afbeeldingen die mensen zelf hebben gekozen. De afbeeldingen zijn slechts een referentie voor de computer en zo is het mogelijk om meerdere objecten in een beeld te herkennen. Maar wat als je de input (dataset) eigen maakt? Afbeeldingen uit jou leven. Dan kijkt de computer op de manier hoe jij kijkt. Maar dan wel met object recognition, wat natuurlijk totaal onmenselijk is, maar wel met persoonlijke input van de mens. Deze manier van een computer laten zien is iets wat de mens niet echt zou (willen) kunnen.

Experiments

You Only Look Once

real-time object detection

Predictions-blind-jw.png


Object Recognition

I used a existed code to learn to understand how object detection works, what kind of database it has and what the possibilities are. This kind of computer vision is used in self-driving cars, army drones, surveillance camera's and so on. It makes predictions based on what's in the database. There are several other databases which you can connect to it. COCO is a library that way more images that this software can use to learn to identify more objects. I tried to get this software in real-time on my computer but unfortunately, my graphic card is too low. (or I did something wrong) So now I'm trying to

Currently I started on a simpeler kind of software, motion detection. It can observe any movement in live camera's or video's. This movement can be detected in great detail, but also in less. RTOD-jw.png


Finally I got the real-time object detection working. I did this on a linux machine, because this one is way faster than my Macbook Pro. So what you can see is that it detects a lot of 'objects'. It's drawing bounding boxes on every object it recognise. Also it gives a percentage of how sure the system is. This data can be used to monitoring a current location in a current time. This kind of technique is being used by the chinese goverment to supervise busy crossroads to check if everyone is obeying the rules. If some people don't, they will be on a blacklist. This idea of monitoring citizens from a western perspective is super weird and guarantees no privacy. This way of monitoring the world is tragically overanxious.


Bbox-selection-2018-04-03 16-39-15.png


BBOX tool is a tool to create bounding boxes by hand. You can use it to create your own datasets voor Darknet YOLO. I downloaded ±700 images of security webcams and I selected on every image where the cams were exactly. https://www.youtube.com/watch?v=aE1kA0Jy0Xg So after labelling every camera in an image (±700 images) I trained the computer to recognize the camera's. Unfortunately it didn't work. I think because i had also pics in the folder that didn't had a camera in the image, so those images were not used. So I tried the dataset of the tutorial and this dataset did work. Also because the computer is trained very long.

Nfpa-predictions0jw.jpg Now I know how I can create my own dataset and with this knowledge opens up new ideas about what this software can do or can be.


Nightmare

darknet nightmare


Jw-jw-montage.jpg


is actually the same idea of how object detection works, but then backwards. And the output you'll get is really beautiful. The machine reproduce the images, but it look like it's getting editted by a weird photoshop brush. It got eyes everywhere and it looks really dreamy. Nobody expect that this kind of images come out. The idea of letting the computer create their own artworks is really cool. Also you can as user use different segmentation options.

Virtual World recognition

Real-Time recognition

Densecap

[Densecap Github] This software can create dense captions by image by using torch. So it creates assumptions that are close to what is on the image. All those captions are generated by a huge dataset of labelled images. This makes it possible to get close to reality.

After a talk with Kim, I decided to create a workflow. How to do my experiments and what this experiments could become. This translation of this titanic story is one of the ways to create a work, but when I change the input and reshape the output, it's starting to get more a visual research documentation. This workflow doc I keep in mind to understand where my experiments will go.

WORKFLOW.png

Down here starting the experiments based on Densecap and my workflow.

Movies

During this experiment I tried to summarize The Titanic. I took in chronological order 34 screenshots of the movie and I put them in Densecap. The computer created around 10 captions of each screenshot. After that I designed all the text as a book. When you read it, it is really vague what kind of story it is. Some of captions are really good and specific, some are more general. What this could become is a poetic and weird story made out of movie screenshots.


Titanic-cover-jw.jpg


Titanic-page-1.jpg Page one


Titanic-page-2.jpg Page two


Napalm-dense.gif how to create a gif like this:

#Convert mp4 to jpgs

$	ffmpeg -ss 00:00:00 -t 00:00:22 -i <name-of-file>.mp4 -r 25.0 yourimage%4d.jpg
#Put all jpgs into densecap
 
$	th run_model.lua -input_dir path/to/jpgfolder -box_width 3 -num_to_draw 4 -output_dir /path/to/output/folder
#Convert jpg to gif

$	 convert *.jpg <name>.gif

News stories

M-zuckerberg-dense.png

I took this picture from internet of Mr Zuckerberg during court. (12-04-2018) Every detection box I cut out and put it after eacht other to get an idea how the machine is looking through the image. What will he see first? and does he detect every person on the image? When I saw this picture in the first place, the first thing I saw was that pale, empty face of Zuckerberg. This machine has no focus point and just watch a picture a different way.

Mr-zuckerberg-16dense-caps-7persons.gif


DECONSTRUCTING DENSECAP - INTERPRATATIONS OF THE MACHINE

All the detections in order from begin to end

Mr-zuckerberg-frames 01.png Mr-zuckerberg-frames 02.png Mr-zuckerberg-frames 03.png Mr-zuckerberg-frames 04.png Mr-zuckerberg-frames 05.png Mr-zuckerberg-frames 06.png Mr-zuckerberg-frames 07.png Mr-zuckerberg-frames 08.png Mr-zuckerberg-frames 09.png Mr-zuckerberg-frames 10.png Mr-zuckerberg-frames 11.png Mr-zuckerberg-frames 12.png Mr-zuckerberg-frames 13.png Mr-zuckerberg-frames 14.png


^

|

|

|

|

These are 11 images that started from the first image. Each first recognition I cutted out and put it again in Densecap. It became an endless string of iterations, of a yellow wall/background.

Mzf-1.jpg the shirt is yellow Mzf-2.jpg a yellow wall Mzf-3.jpg a yellow background Mzf-4.jpg the yellow wall behind the cat Mzf-5.jpg the sky is clear Mzf-6.jpg the sky is clear Mzf-7.jpg the wall is yellow Mzf-8.png the wall is yellow Mzf-9.jpg the wall is yellow Mzf-10.jpg the wall is yellow Mzf-11.jpg yellow and white background

Cropped images

23-04-18 Today I wrote a python script to slice every detected object into a new jpg. So now you can save each object that is detected into a category. This creates a new kind of database which can used by the computer. Maybe I can create new datasets with detected objects. I still have to write a piece of code that can give each cropped jpg the right title. The title should look like this; person_walking_on_the_sidewalk_23-04-18-20-30-34 = <name of object and what is it doing> _ <date> _

import json
from PIL import Image

data = json.load(open('results.json'))

boxes = data["results"][0]["boxes"] # which list do i need to print?

img = Image.open("manhattan.png") # which image im gonna crop?

for i in range(100): # only the first 5 lists
	box = boxes[i]
	
	x1 = box[0]
	y1 = box[1]
	x2 = x1 + box[2]
	y2 = y1 + box[3]

	print (x1, y1, x2, y2) # from xywh to x1y1x2y2

	crop_img = img.crop((x1, y1, x2, y2))
	crop_img.save("cropped/img_%d.jpg" % i) #save image

to create this overview of cropped images. go to right folder and say:

$ montage *.jpg -tile 10x10 -background "#000000" montage.jpg

Original-manhattan-jw.png Montage-manhattan-jw.jpg

Original-tunnel-jw.jpg Montage-tunnel-jw.jpg

Fairy Tales

Art

Live Footage

Mask R-CNN

Balloon

Mask R-CNN is open source software similair to darknet YOLO. It's slower and has less classes in its dataset. But it can detect waaaay more in detail. It recognize for example a person, but it also detects all the pixels that belong to the person. So that's why it called Mask R-CNN. This way it can cut out movement, isolate it, duplicate it, enz..

I first tried the demo in the software and it worked! This demo only can detect balloons, mask them and put everything that's not a balloon in grayscale. I'm not sure if this is gonna be the software that I'm gonna use, but I like it a lot

source img

Ballon1-jw.jpg Balloon1.png

output img

Ballon2-jw.png Balloon2.png

Insights from Experimentation

Artistic/Design Principles

Artistic/Design Proposal

Realised work

Final Conclusions

Bibliography