Difference between revisions of "Computer vision narratives"
| Line 13: | Line 13: | ||
| Can I built a world where there is no trace of reality? (AR/VR) | Can I built a world where there is no trace of reality? (AR/VR) | ||
| − | Can I create a machine that is creating stories out of | + | Can I create a machine that is creating stories out of what he is seeing? | 
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | What if a machine can impose actions to a person? | ||
| + | What if a machine (can recognize a person and) can give commands to a person? | ||
| =Relevance of the Topic= | =Relevance of the Topic= | ||
Revision as of 10:22, 19 April 2018
Contents
Forward/Introduction
Abstract
Central Question
How can I insinuate how machines/camera's/humans unasked monitoring what you're doing?
Can the machine see more than humans do? (create a fake world)
Can I built a world where there is no trace of reality? (AR/VR)
Can I create a machine that is creating stories out of what he is seeing?
What if a machine can impose actions to a person? What if a machine (can recognize a person and) can give commands to a person?
Relevance of the Topic
Hypothesis
Research Approach
Key References
Netherlands
The police is testing with camera's the highways to check if people obey the law. By using object recognition they can recognise the car, liscense-plate and windshield. The camera images from drivers who obey the law, will be destroyed immediately. For dutch principles that's good, but if you want to improve the object-recognition software, you also need images from drivers who doing it right so you get two groups. It is about the decision between technical improvement of privacy of citizens.
China
In China they do it way smarter than in the Netherlands. They keep all the infomation they get to improve their systems. Technical improvement is a way higher priority than privacy, so in China's case the decision is already made.
Sleepwet
Literature
4-D internet. Het internet wat we nu kennen is natuurlijk online aanwezig maar ook offline. Bijvoorbeeld iets uploaden wat tegen de regels is, waardoor er de volgende dag iemand op de stoep staat. Eten bestellen waardoor er binnen 20 minuten een mens op een fiets voor je deur staat met je eten. Deze manier van leven is het internet niet te ontwijken. Overal is de confrontatie aanwezig. Het internet is niet meer alleen een raam naar de online wereld, maar de hele omgeving is het internet geworden. Dit is hoe het tegenwoordig in elkaar zit, naast dat kinderen in groep 3/4 les krijgen in hun eigen moeder taal wordt het ook noodzakelijk om de taal van de computer te spreken. Lessen in coderen. Ik ben van mening dat dit voor de volgende generatie mensen echt belangrijk is omdat alleen dan deze nieuwe generatie niet onwetend wordt van wat voor kracht de machines in ons dagelijks leven hebben. Als de nieuwe generatie niet door heeft hoe deze machines te werk gaan lijdt dit tot een on
Narrative intelligence.
De mens geeft betekenis aan de wereld door middel van verhalen. Mensen zijn nieuwsgierig naar hoe andere mensen hebben geleefd, en vertellen dergelijke verhalen aan elkaar door.(wellicht met hun eigen interpretatie er bij zodat het verhaal steeds een beetje vervormt.) Om machines betekenis te laten geven aan de wereld zijn er ook veel mensen bezig om machines verhalen te laten lezen, leren te begrijpen en vervolgens zelf verhalen te laten schrijven. Narratief kan er voor zorgen om mensen ergens mentaal mee naar toe te nemen en voor iedereen is dit verhaal in hun hoofd anders. Iedereen geeft een andere kleur, vorm of patroon aan het verhaal, en zo genereert dat voor dat specifieke persoon een eigen werkelijkheid. Narratief kan ook een leiding hebben over iemands leven, denk aan religie waar elk verhaal betekenis heeft hoe men zich zou moeten gedragen. Machines hebben daar geen benul van wat een verhaal kan doen. Eerder haalde ik al the Stanley Parable aan, maar dit geeft in weze de essentie van lezer en  verteller. Die grens vervaagt in dat spel, je bent eigenlijk in gesprek met elkaar, niet verbaal maar mentaal. Hierdoor ontstaat er een zeer groot web aan verhaal lijnen die oneindig lijken en door elkaar heen vloeien. 
Object detection. The problem is not just about solving the 'what?', it's also about solving the 'where?'
The difference between traditional and technical images, then, would be this: the first are observations of objects, the second computations of concepts. The first arise through depiction, the second through a peculiar hallucinatory power that has lost its faith in rules. This essay will discuss that hallucinatory power. - V. Flusser
The Traditional image - observation of objects/concepts - depiction of image dependent of rules
The Technical image - computations of concepts - image with hallucinatory power, independent of rules
Object Recognition
Met object recognition word gebruik gemaakt van een dataset waar de machine op getraind is. Deze dataset bestaat uit plaatjes gesorteerd op categorie en zijn allemaal gelabeld in de juiste class. Je kan zelf kiezen hoe vaak en hoe lang je hem laat trainen. Hoe langer, hoe beter hij zal herkennen. Zodra hij getraind is kun je de software pas echt gebruiken. Je kan dan foto’s, video’s of live video inladen om te laten herkennen. De software is zeer snel en accuraat. Maar uiteindelijk gaat het allemaal om hoe hij getraind is en met wat voor dataset.
Het idee dat de computer letterlijk kan waarnemen zonder dat er een mens naar het scherm kijkt is bijzonder. In bepaalde contexten kan dit heel handig zijn, zoals met bewakings camera’s, zoektochten naar bepaalde cellen in het lichaam (afbeeldingen van ziektecellen inladen en in het lichaam zoeken naar zulke cellen) maar wat het zo interessant maakt is dat de output gebaseerd is op afbeeldingen die mensen zelf hebben gekozen. De afbeeldingen zijn slechts een referentie voor de computer en zo is het mogelijk om meerdere objecten in een beeld te herkennen. Maar wat als je de input (dataset) eigen maakt? Afbeeldingen uit jou leven. Dan kijkt de computer op de manier hoe jij kijkt. Maar dan wel met object recognition, wat natuurlijk totaal onmenselijk is, maar wel met persoonlijke input van de mens. Deze manier van een computer laten zien is iets wat de mens niet echt zou (willen) kunnen.
Experiments
You Only Look Once
Object Recognition
I used a existed code to learn to understand how object detection works, what kind of database it has and what the possibilities are. This kind of computer vision is used in self-driving cars, army drones, surveillance camera's and so on. It makes predictions based on what's in the database. There are several other databases which you can connect to it. COCO is a library that way more images that this software can use to learn to identify more objects. I tried to get this software in real-time on my computer but unfortunately, my graphic card is too low. (or I did something wrong) So now I'm trying to
Currently I started on a simpeler kind of software, motion detection. It can observe any movement in live camera's or video's. This movement can be detected in great detail, but also in less.
 
Finally I got the real-time object detection working. I did this on a linux machine, because this one is way faster than my Macbook Pro. So what you can see is that it detects a lot of 'objects'. It's drawing bounding boxes on every object it recognise. Also it gives a percentage of how sure the system is. This data can be used to monitoring a current location in a current time. This kind of technique is being used by the chinese goverment to supervise busy crossroads to check if everyone is obeying the rules. If some people don't, they will be on a blacklist. This idea of monitoring citizens from a western perspective is super weird and guarantees no privacy. This way of monitoring the world is tragically overanxious.
BBOX tool is a tool to create bounding boxes by hand. You can use it to create your own datasets voor Darknet YOLO. I downloaded ±700 images of security webcams and I selected on every image where the cams were exactly. https://www.youtube.com/watch?v=aE1kA0Jy0Xg So after labelling every camera in an image (±700 images) I trained the computer to recognize the camera's. Unfortunately it didn't work. I think because i had also pics in the folder that didn't had a camera in the image, so those images were not used. So I tried the dataset of the tutorial and this dataset did work. Also because the computer is trained very long. 
 Now I know how I can create my own dataset and with this knowledge opens up new ideas about what this software can do or can be.
Now I know how I can create my own dataset and with this knowledge opens up new ideas about what this software can do or can be.
Nightmare
is actually the same idea of how object detection works, but then backwards. And the output you'll get is really beautiful. The machine reproduce the images, but it look like it's getting editted by a weird photoshop brush. It got eyes everywhere and it looks really dreamy. Nobody expect that this kind of images come out. The idea of letting the computer create their own artworks is really cool. Also you can as user use different segmentation options.
Virtual World recognition
Real-Time recognition
Densecap
[Densecap Github] This software can create dense captions by image by using torch. So it creates assumptions that are close to what is on the image. All those captions are generated by a huge dataset of labelled images. This makes it possible to get close to reality.
After a talk with Kim, I decided to create a workflow. How to do my experiments and what this experiments could become. This translation of this titanic story is one of the ways to create a work, but when I change the input and reshape the output, it's starting to get more a visual research documentation. This workflow doc I keep in mind to understand where my experiments will go.
Down here starting the experiments based on Densecap and my workflow.
Movies
During this experiment I tried to summarize The Titanic. I took in chronological order 34 screenshots of the movie and I put them in Densecap. The computer created around 10 captions of each screenshot. After that I designed all the text as a book. When you read it, it is really vague what kind of story it is. Some of captions are really good and specific, some are more general. What this could become is a poetic and weird story made out of movie screenshots.
#Convert mp4 to jpgs $ ffmpeg -ss 00:00:00 -t 00:00:22 -i <name-of-file>.mp4 -r 25.0 yourimage%4d.jpg
#Put all jpgs into densecap $ th run_model.lua -input_dir path/to/jpgfolder -box_width 3 -num_to_draw 4 -output_dir /path/to/output/folder
#Convert jpg to gif $ convert *.jpg <name>.gif
News stories
I took this picture from internet of Mr Zuckerberg during court. (12-04-2018) Every detection box I cut out and put it after eacht other to get an idea how the machine is looking through the image. What will he see first? and does he detect every person on the image? When I saw this picture in the first place, the first thing I saw was that pale, empty face of Zuckerberg. This machine has no focus point and just watch a picture a different way.
DECONSTRUCTING DENSECAP - INTERPRATATIONS OF THE MACHINE
All the detections in order from begin to end
^
|
|
|
|
These are 11 images that started from the first image. Each first recognition I cutted out and put it again in Densecap. It became an endless string of iterations, of a yellow wall/background.
 the shirt is yellow
 the shirt is yellow 
 a yellow wall
 a yellow wall 
 a yellow background
 a yellow background
 the yellow wall behind the cat
 the yellow wall behind the cat
 the sky is clear
 the sky is clear
 the sky is clear
 the sky is clear
 the wall is yellow
 the wall is yellow
 the wall is yellow
 the wall is yellow
 the wall is yellow
 the wall is yellow
 the wall is yellow
 the wall is yellow
 yellow and white background
 yellow and white background
Fairy Tales
Art
Live Footage
Mask R-CNN
Balloon
Mask R-CNN is open source software similair to darknet YOLO. It's slower and has less classes in its dataset. But it can detect waaaay more in detail. It recognize for example a person, but it also detects all the pixels that belong to the person. So that's why it called Mask R-CNN. This way it can cut out movement, isolate it, duplicate it, enz..
I first tried the demo in the software and it worked! This demo only can detect balloons, mask them and put everything that's not a balloon in grayscale. I'm not sure if this is gonna be the software that I'm gonna use, but I like it a lot
source img
output img





























